Category Archives: Data & Research

Data Cleaning

In case you are really into data mining maybe you have wondered what happens to data after is extracted: does it gets delivered the way it is or there is more?

The truth is that extraction is only one part of the process and it is followed by several others, including Data Cleaning, the subject of today’s article.

The necessity for such a process has always been present in scientific areas where misleading results can induce false conclusions and lead to failure of the initial purposes but the automation has occurred relatively recent, in the last two decades when the need for cleaning was imposed to a very large quantity of data.

For data to be considered of high quality it must fulfill a series of requirements such as:

  • Validity, which represents the degree of correspondence with the usual business constraints. This is relatively easy to ensure, having to set up specific indicators as Data-type constraints or Range constraints or Mandatory constraints.
  • Decleansing represents error detection and syntactically removal of them for better programming.
  • Accuracy: The degree of conformity of a measure to a standard or a true value; this also requires an external set of data for comparison.
  • Completeness: percentage to which all required measures are known.
  • Consistency: The degree to which a set of measures are equivalent in across systems.
  • Uniformity, which ensures that all the measurements have the same measurement units and some aspects of validation.

This research area has more to complete until all the challenges that optimization imposes will be fixed. Today, problems like Error correction and lose of information through it, or Maintenance of cleansed data still create serious issues, but the with the advance of Big Data and interest exertion from the big companies such as IBM or Oracle in this field we can be optimistic and say that we are on the right track .

Why emotions are important in marketing

I don’t know about you but i haven’t given very much thought of how do i feel the instance that i press the “share” button. I recently found out that i was ignoring a much more important part of online marketing than it seems, and i corrected myself, all because of emotions.

When it comes to what we feel, everything can be expressed as a sum of four basic emotions: happy, sad, afraid and angry, that combine themselves and form a variety of other feelings about which we may or we may not be aware. For better understanding of this we may look at Robert Plutchik’s famous “wheel of emotions” that shows just some of the well known emotional layers.

 

Studies have also revealed that the emotional state that “gets” the most of the likes is happiness, which is normal if we consider that  our first emotional action in life is to respond to our mother’s smile with a smile of our own. Obviously, joy and happiness are hard-wired into all of us, as discovered by the psychoanalyst Donald Winnicott. And because happiness almost never comes as a self sustainable feeling we can see that the top 10 emotions that people have when sharing something are made of positive ones, as studied by Fractl.

top 10

 

More than this Jonah Berger, professor of marketing at the University of Pennsylvania’s Wharton School and author of Contagious: Why Things Catch On, conducted a study from which  he found that an article was more likely to become viral the more positive it was.

 

Of course we shouldn’t neglect that there are also other feelings that may interact with our online behavior.  For instance sadness helps us connect and empathize by producing cortisol, known as the “stress hormone”; and oxytocin, a hormone that promotes connection and empathy. Further research revealed that when we are angry the hypothalamus makes us more stubborn and fear only makes us more desperate to find something or someone to cling on.

Considering all this information is easy to understand the high significance of emotions in marketing, especially when considering that an analysis of the IPA dataBANK, which contains 1,400 case studies of successful advertising campaigns where, campaigns with purely emotional content performed about twice as well (31% vs. 16%) as those with only rational content.

And this is why we can’t underestimate the importance of understanding the science of emotion in marketing!

 

How to use regex in Vim?

We often need to process big text files (larger than 100 mb) and we discovered that best text editor for this is Vim and gVim (windows version). Also a powerful mode to process text automatically is to use regular expressions (also called RegEx).

Using RegEx in Vim

Vim doesn’t support standard RegEx, but we built a tool that converts standard regex to Vim regex. This tool it’s available here: RegEx to Vim.

We hope that is useful for you.

Why do you need Facebook for your business?

This might be a relatively simple question but the complexity of the answer might surprise you!

First of all, if you have a business and you don’t have a Facebook page for it, well, i’m sorry to tell you that you might be among the last ones which doesn’t. Forgotten are the times when not everyone were on this social platform and now, user concentrate on adding everything, from everywhere to it; this includes businesses, places, currents, events, personalities and many other daily-life aspects, all with the purpose of simplifying our actions.

People tend to be skeptical about the success of a business page but what they fail to understand is that even a small page with a small audience can make a difference. A research shows that merely six percent of all Facebook pages have more than ten thousand likes and that is not a problem. If given time the popularity of a page will grow and more and more users will be interested in the information provided by you. Another reason why people tend to avoid having a social media page for their business is that because it’s hard to keep it updated all the time. Although constant posts will keep your fans happy there is not a direct correlation between the posting span and the growth of the page.

On the other side there are plenty of different reasons that your business needs Facebook. What it boils down to, though, is that this is a free opportunity to reach out to your audience in their preferred environment, improve your SEO rankings and visibility, and show off your business in a way that people can relate to. This being said we want to familiarize you with TheWebMiner Facebook page were we constantly post technical news, updates about our tool or tips for our field of activity.

Smartphones Are Taking Over

One of the most important events on consumer electronics takes place these days Barcelona. Of course that I refer to Mobile World Congress where the most exciting latest technologies are being revealed. Manufacturers have to keep up with the demands of the market in producing more reliable and cheaper devices for everyday use.

Smartphones have one of the leading roles in this congress as their demand has grown unexpectedly in the last years. By the end of 2013 more than two thirds of the mobile subscribers in US had such a device and it seems that in few years conventional mobile phones will be forgotten.

Although smartphone penetration is pretty interesting to follow, a more interesting remark derives from it: the amount of time that people use to browse the web from their smart devices has significantly grew also, and actually in US it has already exceeded  the web usage on computers. According to a survey conducted by nielsen.com the average American spends 34 hours per month using his mobile and almost 27 hours on his computer. In Europe, the gap between the two has grown even further, with almost 42 hours spent by the average UK user on his mobile versus 28 hours on computer. The average Italian spends 37 hours on his mobile monthly and only 18 on computer, and the list can go on.

Not only are consumers spending more time using their phones but it seems that they are unable to put them down, the number of times a user accesses his smartphone throughout a normal day having risen from 5.5 times at the beginning of 2013 to 9 times in December.

Having all this information we can only predict even a larger growth of usage popularized by friendly interfaces and useful applications until gadgets will have minimized the interaction with our old trustworthy computers.

Experimentation is a Must

word cloud

 

This word cloud represents the answer to the question of which areas are you going to be experimenting most heavily in the coming year, given by more than 600 representatives of different online companies.

Experimenting is one of the key tools of marketing engineering, and although the results of test-and-learn approaches are more widely appreciated, establishing the most appropriate culture is what holds most companies back, if we neglect the fear of failure.

Continue reading

What are the most exciting opportunities for companies this year?

Hello everybody. Today I want to share with you a new research conducted by Adobe that we got our hands on these days. It’s actually a perspective view over the digital world and a briefing of what to expect from this field in 2014.If last year was about recognizing the importance of customer experience, this one is about actually doing something. For an optimal customer experience, various business functions and customer-facing touch points need to be working in harmony, from customer service and advertising to online user experience, content management and email messaging. For the question of which one area is the single most exciting opportunity for your organization (or for your client) in 2014 the responses were diverse but following a well defined pattern.

the webWe know the number of company respondents (980) and the one of Agency respondents (1202) and from this we conclude that organizations need to ensure they have the right data, technology and culture to act as the foundation for a great customer experience, with a focus on multichannel marketing and campaign management also required to underpin a successful approach.

The mobile part, the second most exciting opportunity in the eyes of client-side respondents and first on the list for supply-side participants has a well earned position. About this we’ve already written in another post few weeks ago and is no surprise at all.  Despite the importance of mobile and prominence of smartphones and tablets in our lives, many companies are still trying to work out how they optimize their websites for mobile, for example, whether they should go down the ‘responsive’ route or not.

A second figure that i’d like to discuss is not so much about the big picture and more about specific disciplines. The question is which three digital-related areas are the top priorities for your organization (or for your clients) in 2014?

miner

It seems that marketers and digital professionals are clear on what the priorities are, and this has not changed markedly in the last year, knowing that the top five options are in exactly the same order as for last year’s survey, respectively content marketing, social media engagement, targeting and personalization, conversion rate optimization and mobile optimization.

 

 

 

Data Grants from Twitter

There is a lot of fuss these days because of the newest announcement made by Twitter on February 5. They encourage research institutions to apply until March 15 in what seem to be a scientific lottery, for a chance to the access of twitter’s data sets. Around 500 million Tweets are sent out each day and if they were to be scientifically quantified, studies like where the flu may hit,  health-related information or events like ringing in the new year could be analyzed from a statistical point of view and outcomes could be predicted.

Twitter acknowledges the difficulties that researchers have to face when they have to collect data from the platform and therefore it named this project the Twitter Data Grants, aiming for a better connection between research institutions or academics and the data they need. Also, along with the data itself, the company will offer for the selected institutions the possibility of collaboration with their own engineers and researchers, all this, with the help of Gnip one of the most important Twitter’s certified data reseller partner.

How to make your life easier with Google

Ok, maybe the title is a little bit too optimistic, but today I want to talk about one of the many Google products that makes our daily life better.  Everyone uses Google, either for personal matters or for business interest but how many have heard about The Google Prediction API?

This, as the most of their projects, comes to our help, by learning algorithms to analyze your historic data and predict likely future outcomes. It can be very helpful, especially in the case where big amounts of data are to be handled. You can also say that Big Data is not anymore the future, it’s now and you have to know how to take advantage of it.

Among the uses of Prediction API we can mention, separation of certain types of messages, considering the languages that are written in for specific answers, or spam detection, based on comparison to a lists of already marked spam messages. But maybe the most important use case that we can think of is the purchase prediction, the ability to understand the customer’s behavior and to decide whether or not he is going to make a purchase from your e-commerce business.

In the past, this would have been done using a regression model, being very time consuming and quite hard and this is why I believe that Google Prediction API is one of the tools that will make your life easier and increase profit on your internet business.

Facebook tomorrow!

There comes a time in each of our lives when we wonder ourselves either from curiosity or from perspective  what is going to be the next big thing, and because this is a blog dedicated to science we are gonna restrict to this area.

Of course we can’t know what is going to be the technology of tomorrow but we are going to tell you what is not going to be: Facebook!  According to Princeton’s engineers facebook it’s very likely to reach to an end in the next few years. They used for the research an epidemiological model, very similar to Gaussian bell but more complex in the way of describing the transmission of communicable disease through individuals. According to the model chosen, called SIR the total number of population equals the sum of Susceptible plus Infected plus Recovered persons. They chose this pattern because is relevant for phenomena with relative short life span, and after that they applied in the case of MySpace and they noticed that it fit almost perfectly.

9T7Sm6a

 

We can easily see in this graph that the decline of facebook has already begun but it’s not as near as expected. Actually we can be sure that we will not exterminate it from our lives sooner than 2018 but also, internet can be a very unpredictable place and no one can exactly determine how it’s going to end.

Also we advise you not to take for granted this study because, as we found out, it was conducted by researchers based in the school’s department of mechanical and aerospace engineering. Not saying that they are not professionals but nevertheless not experts in such social studies.