Tag Archives: data analysis


Data science automates love

Tinder App is nothing new for anybody since most of us slowly accepted it in our lives but it also brings some displeasure. For instance, this guy thought that it can automate the process in the way of an app that decides if you’d like a person and start a conversation.

Continue reading

Our year in review

From the very beginning, one of the core services TheWebMiner provided was aggregated data and insight into the mobile app landscape. We managed to offer our clients custom aggregated data for all major mobile app marketplaces (iOS AppStore, Google Play, Amazon AppStore etc.) as well as primary analysis on the extracted data.

Continue reading

Big Data and Data Mining Tools

Recently we have tested a Data Mining tool about which i want to write today. It is called Datameer and it’s a cloud app based on Hadoop so we don’t need to install anything on our computers but we must have the data that we want analyzed.

Step 1: Importing the data

To import any kind of data we must select the format of them:


Step 2: A small configuration

Some of which regard data format, others of the way to detect certain data types. This program tries to detect each column’s type being possible to add data types from a file:


Step 3: Some fine adjustments
If the program doesn’t detect the columns well we can do it manually.  A bad of this program is the fact that we can adjust data at this step only by removing of the recordings that won’t correspond to the type of data recently defined.


Step 4:Selecting the sample used for previsualisation


So this is all it is to be done for adding data into Datameer. Further on, an excel-like interface shows all the data .
Here we can find a few buttons responsible for the magic:

Column Dependency
Shows the relation between different columns and basically if a variable depend on other.

Using this we can group similar data.
All the discovering part is done by the program and we only have to specify the number of clusters that we want.

Decision Tree
Builds a decision tree based on the data.

These are all the important function of Datameer, but the true importance of this App relies not on the functions but on the ability of processing a huge quantity of data/

How to make your life easier with Google

Ok, maybe the title is a little bit too optimistic, but today I want to talk about one of the many Google products that makes our daily life better.  Everyone uses Google, either for personal matters or for business interest but how many have heard about The Google Prediction API?

This, as the most of their projects, comes to our help, by learning algorithms to analyze your historic data and predict likely future outcomes. It can be very helpful, especially in the case where big amounts of data are to be handled. You can also say that Big Data is not anymore the future, it’s now and you have to know how to take advantage of it.

Among the uses of Prediction API we can mention, separation of certain types of messages, considering the languages that are written in for specific answers, or spam detection, based on comparison to a lists of already marked spam messages. But maybe the most important use case that we can think of is the purchase prediction, the ability to understand the customer’s behavior and to decide whether or not he is going to make a purchase from your e-commerce business.

In the past, this would have been done using a regression model, being very time consuming and quite hard and this is why I believe that Google Prediction API is one of the tools that will make your life easier and increase profit on your internet business.

Web Scraping’s 2013 Review – part 2

As promised we came back with the second part of this year’s web scraping review. Today we will focus not only on events of 2013 that regarded web scraping but also Big data and what this year meant for this concept.

First of all, we could not talked about the conferences in which data mining was involved without talking about TED conferences. This year the speakers focused on the power of data analysis to help medicine and to prevent possible crises in third world countries. Regarding data mining, everyone agreed that this is one of the best ways to obtain virtual data.

Also a study by MeriTalk  a government IT networking group, ordered by NetApp showed this year that companies are not prepared to receive the informational revolution. The survey found that state and local IT pros are struggling to keep up with data demands. Just 59% of state and local agencies are analyzing the data they collect and less than half are using it to make strategic decisions. State and local agencies estimate that they have just 46% of the data storage and access, 42% of the computing power, and 35% of the personnel they need to successfully leverage large data sets.

Some economists argue that it is often difficult to estimate the true value of new technologies, and that Big Data may already be delivering benefits that are uncounted in official economic statistics. Cat videos and television programs on Hulu, for example, produce pleasure for Web surfers — so shouldn’t economists find a way to value such intangible activity, whether or not it moves the needle of the gross domestic product?

We will end this article with some numbers about the sumptuous growth of data available on the internet.  There were 30 billion gigabytes of video, e-mails, Web transactions and business-to-business analytics in 2005. The total is expected to reach more than 20 times that figure in 2013, with off-the-charts increases to follow in the years ahead, according to researches conducted by Cisco, so as you can see we have good premises to believe that 2014 will be at least as good as 2013.


About sentiment analysis

Hello internet,

As you probably know, we deal everyday with data scraping, which is quite challenging, but, from time to time we tend to ask ourselves what else is there, and especially, can we scrap something else other than data? The answer is yes, we can, and today I am going to talk about how opinion mining can help you.

Opinion mining, better known as Sentiment analysis deals with automatically scan of a text and establishing its nature or purpose. One of the basic tasks is to determine whether the text itself is basically good or bad, like if it relates with the subject that is mentioned in the title. This is not quite easy because of the many forms a message can take.

Also the purposes that sentiment analysis can be to analyze entries and state the feelings it express (happiness, anger, sadness). This can be done by establishing a mark from -10 to +10 to each word generally associated with an emotion. The score of each word is calculated and then the score of the whole text.  Also, for this technique negations must be identified for a correct analysis.

Another research direction is the subjectivity/objectivity identification. This refers to classifying a given text as being either subjective or objective, which is also a difficult job because of many difficulties that may occur (think at a objective newspaper article with a quoted declaration of somebody). The results of the estimation are also depending of people’s definition for subjectivity.

The last and the most refined type of analysis is called feature-based sentiment analysis. This deals with individual opinions of simple users extracted from text and regarding a certain product or subject. By it, one can determine if the user is happy or not.

Open source software tools deploy machine learning, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media. Knowledge-based systems, instead, make use of publicly available resources to extract the semantic and affective information associated with natural language concepts.

That was all about sentiment analysis that TheWebMiner is considering to implement soon. I hope you enjoyed and you learned something useful and interesting.