Author Archives: Bogdan Balcan

The science behind an internet request

Altruism can be found in many shapes on the internet, especially on sites designed for user interaction, like blogs, forums or social networks. The giant Reddit even has a special thread The random acts, on Pizza section which is specialized in giving free pizza to strangers if the story they tell is worth one. It is fun and the motto is as simple as that: “because … who doesn’t like helping out a stranger? The purpose is to have fun, eat pizza and help each other out. Together, we aim to restore faith in humanity, one slice at a time.”

This great opportunity rises an objective popular question in our minds though: What should one say to get free pizza, and furthermore, what should one say to get any kind of free stuff on the internet? A possible answer comes once again from the science of data mining. Researchers at Stanford University analyzed this intriguing problem but limited to Reddit posts.

By mining all the section posts from 2010 until today and passing them through filters like sentiment analysis, politeness and more important if they wore successful or not, a pattern was established.Altruism I

Predictability rate resulted is up to 70 % accuracy and beside the sociological observations, like the positive results of longer posts or the negative results of very polite posts it is interesting to observe the algorithm that made all this possible by dividing the narratives into five types, those that mention: money; a job; being a student; family; and a final group that includes mentions of friends, being drunk, celebrating and so on, which the team  called “craving.”

This study has a very important role in analytics of behavior of peers on the internet and opens a wide area of research for better understanding of online consumers around the world.



Challenging users to data science

A problem well known in the data science world is the mismatch between people who have the data and people who know how to use it. On the other hand data scientists complain about the difficulties of the scrapping process and more exact, the difficulties of obtaining the data. For this mismatch Kaggle was created, trying to mediate a connection between data and analysts.

The platform was born on this principles and creates a competition between users which must update solutions to diverse data sets and so to win points, and, in the end, money.

On the other side, the uploader of data gets a number of possible solutions of analysis to his data sets, from which he can choose the most appropriate to his interests.

A very interesting case study, and a powerful demonstration in favor of Kaggle capabilities is the collaboration that the platform has, with NASA and Royal Astronomical Society, in which the challenge was to find an algorithm for measuring the distortions in images of galaxies in order for scientists to prove the existence of dark matter. It seems that within a week from the start of the project, the accuracy of the algorithms provided by NASA, and obtained in studies started back in 1934 and continued to that time was reached. More than this , within three months from the start of the project, an algorithm was provided by a user, that was more than 300% more accurate than any of the previous versions. The whole case study can be found here.

 essentially, the fun thing about Kaggle is that the winners of the competitions are folks around the world with a knack for problem solving, and not always degrees in mathematics. And degrees don’t matter on Kaggle; all that matters is result. 



Enigma Analytics

Without any introduction we can certainly say that Enigma is a tool that should not be ignored by any data enthusiast. First introduced to the wide public at TechCrunch Disrupt NY 2013 where this start-up was the grand winner, it has gained popularity by simplicity of use and wide availability of its content.

Enigma allows its users to explore a vast amount of publicly available although not easy to obtain data. The service pulls its data from more than 100,000 data sources, a major advantage being a deceptively simple process of sifting through all the information  — a quick search for a person’s name or company brings up multiple detailed sources of information, and jumping in and playing with data is thoughtfully executed. 

By now the excellent simplistic design and usefulness of the information provided in one place has brought the company partnerships with the Harvard Business School, research firm Gerson Lehrman Group, S&P Capital IQ, and newly-minted strategic investor the New York Times.

Although by now it has proven itself a very useful tool Enigma has its ups and downs. The biggest downsize is the fact that it only has databases collected from american government and american local authorities, which is great because those datasets are public and free but they are not very useful for researchers from another countries, unless they are studying their country relations with America. Second of all, its simplistic design can be a bit confusing at first because it’s a new type of application and not all of its functions are clear. However this can be avoided if before browsing through the site you first visit the support section.

All in all , we have reached the verdict that Enigma is a great App if you are interested in public data of America, not easy to obtain otherwise.

Companies join forces against FCC

After years of pressure from ISPs, net neutrality is under threat by the FCC itself. Chair Tom Wheeler promised to revive the Open Internet Order after it saw an unceremonious defeat in January, but a leaked version of his latest proposal would let companies pay ISPs for a “fast lane” to subscribers, undermining the spirit of the original rules, which barred companies from discriminating between services. Despite Wheeler’s reassurances, this new proposal is the exact opposite of net neutrality. It could undermine both the companies of today and the startups of tomorrow. It might also be exactly the push activists need to fight back, according to The Verge .

As Washington Post suggest, more than 150 internet firms are protesting in a letter to the Federal Communications Commission. The companies asked federal regulators to reconsider a proposal that critics fear would allow Internet providers to charge for faster, better access to consumers. The list includes Amazon, Facebook, Google and Microsoft, along with dozens of other firms that called the prospect of paid fast lanes “a threat to the Internet.”

With just a week to go before the Federal Communications Commission meets to consider its proposed new rules for ISPs, the letter represents a late attempt by Silicon Valley to take a stance on the open Internet.

“Instead of permitting individualized bargaining and discrimination,” the companies wrote, “the commission’s rules should protect users and Internet companies on both fixed and mobile platforms against blocking, discrimination and paid prioritization, and should make the market for Internet services more transparent.”

The main question is whether a slow-down protest would have any impact. But it is undoubtedly worth starting a broader conversation about what the Internet community can do together to protest the FCC’s proposed rules.

Business Manager from Facebook

5Now that the holiday is over we can get back to business with high stamina and low desire to stop. And so the big news this week is that Facebook presented its new Business Manager for entrepreneurs and marketers meant to ease up the work on different ad campaigns and even multiple pages from one account, a well received feature. Another feature of it, is that allows tasks for different users on the same project by adding or revoking permissions.

In order to use Business Manager  you have to first set up your account, which makes you a so called ‘a business ad man’ – a position in which you can modify all aspects of the business. Another new feature is that you can assign tasks even to people who are not your Facebook friends and you don’t know anymore that a work email.

Even though it seems like a very well intended tool TechCrunch tends to point out that it is a bit of a reachout for Facebook. Even considering this, the ability for small business owners, who don’t have big budgets, to be successful on Facebook will directly correlate to their ability to quickly manage their pages, content, and ads to drive engagement.
We can only hope that Facebook continues releasing tools that make it easy for small businesses (not just fortune 500 brands) to have success on the platform.

Twitter’s redesigned pages 101

If you’ve been active on twitter lately you may have noticed the major changes that happened on the micro-blogging platform. Since it has been founded, in 2006 twitter has distinguished itself from other platforms by revolutionized the way people communicate on the internet. It imposed that people should only deliver the essential of their message by setting up a 140 character limit and so it reinvented the wheel.

Now in the last few days twitter has implemented a new design for user’s profiles and it’s nothing of what we are used to. Instead of the regular plain customizable profile we we have a facebook-like page with the main picture and bio scaled to the left and significantly more real estate dedicated to the header photo. The new look has a greater focus on photos and content cards.

Also this change in looks brought a new category with it, called “out on the profile” , where you can view who you are following, followers, favorites and lists.

Major brands had not skipped the opportunity to promote along with this changes and had almost immediately responded by redesigning  their pages (in the photos:Samsung and Kia)


If the new profile hasn’t reached you yet it’s not to worry about because the changes will be slowly implemented to users in random manner so your turn will come up, only with more followers we hope.

Data encryption is the key for protection

edward snowden


Edward Snowden has come up once again to our attention when he featured as a speaker before two prominent technology conferences: SXSW and TED, this March. The world’s most famous whistleblower has said it before, but reiterated it for the SXSW crowd, that end-to-end encryption would go a long way towards protecting user data from both spying and attackers but it will never be completely effective, no matter how hard we would try.

He, once again regarded the data harvesting that happens against our willing in companies over our emails or files in order for us to be served with the best commercials. Normally the process of data encryption is used by many services at either end of communication but often companies decript once more the data along the transmission in order to interact with user’s feelings.

There are available on the web encryption tools like TOR and PGP (pretty good protection) but they are well known for the difficulties they impose to non technical people along installing and using them. On top of this it has been straight stated that if your government wants to investigate your data it can do so, encryption is only one inconvenience over mass data gathering over users.

In the end we remind you that Snowden revealed compromising data about the United States government, including about their mass data mining program known as PRISM whose purpose is to gather data from all media sources. It is approximated that PRISM is responsible for 91% of all the NSA’s traffic. Another program revealed was XKeyscore, initially used by the united states but previously shared with numerous other governments in order to have a worldwide look over the internet searches and traffic.



TheWebMiner in French

Good day everyone, or should i say better bonjour, because along this week we have launched the french version of

It is a certitude that the need for data increases every day in every possible direction and we want to keep up with this trend. Although English is the language of the internet we want to reach also to other users from smaller environments that might need our services, and because French is the official language in 29 countries it seemed as an obvious choice.

So, from now on along with the English version and the Romanian, which is the base country of our company a third version is available to choose in the language menu from the upper right corner of  our site.


We hope you will enjoy your experience and will provide a good feedback on our expansion.


Data Cleaning

In case you are really into data mining maybe you have wondered what happens to data after is extracted: does it gets delivered the way it is or there is more?

The truth is that extraction is only one part of the process and it is followed by several others, including Data Cleaning, the subject of today’s article.

The necessity for such a process has always been present in scientific areas where misleading results can induce false conclusions and lead to failure of the initial purposes but the automation has occurred relatively recent, in the last two decades when the need for cleaning was imposed to a very large quantity of data.

For data to be considered of high quality it must fulfill a series of requirements such as:

  • Validity, which represents the degree of correspondence with the usual business constraints. This is relatively easy to ensure, having to set up specific indicators as Data-type constraints or Range constraints or Mandatory constraints.
  • Decleansing represents error detection and syntactically removal of them for better programming.
  • Accuracy: The degree of conformity of a measure to a standard or a true value; this also requires an external set of data for comparison.
  • Completeness: percentage to which all required measures are known.
  • Consistency: The degree to which a set of measures are equivalent in across systems.
  • Uniformity, which ensures that all the measurements have the same measurement units and some aspects of validation.

This research area has more to complete until all the challenges that optimization imposes will be fixed. Today, problems like Error correction and lose of information through it, or Maintenance of cleansed data still create serious issues, but the with the advance of Big Data and interest exertion from the big companies such as IBM or Oracle in this field we can be optimistic and say that we are on the right track .

Why emotions are important in marketing

I don’t know about you but i haven’t given very much thought of how do i feel the instance that i press the “share” button. I recently found out that i was ignoring a much more important part of online marketing than it seems, and i corrected myself, all because of emotions.

When it comes to what we feel, everything can be expressed as a sum of four basic emotions: happy, sad, afraid and angry, that combine themselves and form a variety of other feelings about which we may or we may not be aware. For better understanding of this we may look at Robert Plutchik’s famous “wheel of emotions” that shows just some of the well known emotional layers.


Studies have also revealed that the emotional state that “gets” the most of the likes is happiness, which is normal if we consider that  our first emotional action in life is to respond to our mother’s smile with a smile of our own. Obviously, joy and happiness are hard-wired into all of us, as discovered by the psychoanalyst Donald Winnicott. And because happiness almost never comes as a self sustainable feeling we can see that the top 10 emotions that people have when sharing something are made of positive ones, as studied by Fractl.

top 10


More than this Jonah Berger, professor of marketing at the University of Pennsylvania’s Wharton School and author of Contagious: Why Things Catch On, conducted a study from which  he found that an article was more likely to become viral the more positive it was.


Of course we shouldn’t neglect that there are also other feelings that may interact with our online behavior.  For instance sadness helps us connect and empathize by producing cortisol, known as the “stress hormone”; and oxytocin, a hormone that promotes connection and empathy. Further research revealed that when we are angry the hypothalamus makes us more stubborn and fear only makes us more desperate to find something or someone to cling on.

Considering all this information is easy to understand the high significance of emotions in marketing, especially when considering that an analysis of the IPA dataBANK, which contains 1,400 case studies of successful advertising campaigns where, campaigns with purely emotional content performed about twice as well (31% vs. 16%) as those with only rational content.

And this is why we can’t underestimate the importance of understanding the science of emotion in marketing!