Monthly Archives: June 2014

Processing of large text files

We, at TheWebMiner we have often the need of processing large text files, and when i say large i mean files of few hundreds of Megabytes or bigger. Out of all the text processing tools that we’ve tested so far we concluded that the best was Vim or gVim (the windows version of this famous editor).

Regulated Expressions

Another useful tool in file processing are regulated expressions, or, more simple RegEx. These expressions help us find, or find and replace pieces of text of a certain format, all being done automatically. By combining the two definitions we discover a new problem.

How do we use Regulated Expressions in Vim?

Vim has its own format for RegEx so we cannot use standard regulated expressions Of Vim but we have created and put to your disposal a convertor for this purpose. You can find the converter on our site ( , and we hope that this will come to your help.

Get ready to adapt your business to the future!




Recently while browsing the internet I’ve stumbled upon an article that captured my whole attention. Articles about how life is going to change in recent future and how technology becomes more and more a part of our lives are easy to find but in this article the accent is put on business development in recent and more distant future in all of its aspects, some of which i want to share with you.

First of all we want to remind that change is mandatory. Maintaining a certain business plan on a company for a long time can only lead to stagnation and in the end, to failure, as seen in the cases of Kodak or more recent Blackberry.

Having these examples we must not fear to embrace new promising technologies and stay ahead of competitors. Job market will evolve also creating new jobs that today sound simply weird like Nostalgist, Simplicity Expert or End of life Therapist. while also replacing more than 2 billion of today’s jobs through Automation.

Coming to the Advertisements sector we can say that revolution already became part of our lives by developing targeting algorithms meant to deliver the best commercials for us and culminating with knowing us better than our closest ones .

There are many to tell about how things are predicted to change in a few decades and we can’s stay ahead of everything but the least we can do is to be prepared, and optimists about change, after all, curiosity is our greatest gift.




Perspective Analytics is what really matters

I don’t know how much have you heard about Perspective Analytics because it is not as popular as Descriptive and Predictive Analytics but sure it has the power of changing how we treat Big Data.

By taking a blunt look at this situation we can say that Perspective Analytics is the new term to name the step from analytics to knowledge in the data to knowledge pyramid. Predictive analytics is the next step up in data reduction. It utilizes a variety of statistical, modeling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. As we know, big data imposes a huge amount of information the majority of which is useless, hence the necessity for this new service.

The purpose of analytics is not to tell you what is going to happen in the future but, because of its probabilistic nature, to inform you of what MIGHT happen, based on a a predictive model with two additional components: actionable data and a feedback system that tracks the outcome produced by the action taken.

This type new step/ type of analytics was first introduced in 2013 after the Descriptive Analytics was defined as the simplest class of analytics, one that allows you to condense big data into smaller, more useful nuggets of information, after which next step in reducing information is by applying a Predictive algorithm.

IBM’s vision is that descriptive analytics allows an understanding of what has happened, while advanced analytics, consisting of both predictive and prescriptive analytics, is where there is real impact on the decisions made by businesses every day


Latest mobile app trends

Mobile app market has changed over the years in many unexpected ways, but if there is something that everyone expected is that it is continuously growing. This rather new industry has expanded in every direction forcing the limits of creators imagination and of mobile physical capabilities.

TheWebMiner team has set up a series of graphics showing not only the evolution of mobile apps market for Android and iOS but also the most important trends to follow. The presentation can also be viewed here.

As a conclusion we can certainly say that app market has a steady position over mobile web market, always searching for new possibilities and expand areas few of which will go mainstream soon accustoming users with concepts like Internet of Things, or Mobile Payments in everyday situations. offers now structured list of all apps of Google Play and iTunes in any format suits your needs, with respect to any indicators on the site (you can find data here)!

The science behind an internet request

Altruism can be found in many shapes on the internet, especially on sites designed for user interaction, like blogs, forums or social networks. The giant Reddit even has a special thread The random acts, on Pizza section which is specialized in giving free pizza to strangers if the story they tell is worth one. It is fun and the motto is as simple as that: “because … who doesn’t like helping out a stranger? The purpose is to have fun, eat pizza and help each other out. Together, we aim to restore faith in humanity, one slice at a time.”

This great opportunity rises an objective popular question in our minds though: What should one say to get free pizza, and furthermore, what should one say to get any kind of free stuff on the internet? A possible answer comes once again from the science of data mining. Researchers at Stanford University analyzed this intriguing problem but limited to Reddit posts.

By mining all the section posts from 2010 until today and passing them through filters like sentiment analysis, politeness and more important if they wore successful or not, a pattern was established.Altruism I

Predictability rate resulted is up to 70 % accuracy and beside the sociological observations, like the positive results of longer posts or the negative results of very polite posts it is interesting to observe the algorithm that made all this possible by dividing the narratives into five types, those that mention: money; a job; being a student; family; and a final group that includes mentions of friends, being drunk, celebrating and so on, which the team  called “craving.”

This study has a very important role in analytics of behavior of peers on the internet and opens a wide area of research for better understanding of online consumers around the world.