Author Archives: Bogdan Balcan

Big Data as a part of your life

For those of you who aren’t convinced yet that Big Data infiltrates more and more in our daily lives i have one more proof that data science is the future of all sciences and has the power of interconnecting all branches of life. it’s not a fast process, replacing conventional methods of taking decisions with ones more analytically but certainly the small steps we are taking are headed in that direction.


Although the improvement was of only 12% Los Angeles took a giant leap last week when they announced that after a effort of more than 30 years of local authorities they managed to synchronize and adapt to traffic conditions all the red light signals in the 4 millions residents city, making this an experiment we will all have something too learn in the near future.

Research engineers in state DOT explained that synchronizing traffic light is only a small step in decongesting the traffic and this is why they are more focused on studying the habit of the drivers. Since the ’80s they anonymously gathered data, first through a network of cables and sensors placed into the roads, called loop wires and more recently, as the technology evolved, they used toll bots to calculate average speed of cars or wireless sensors to detect number of people in the car, by the number of phones inside and their behaviors.

The next step, traffic researchers say will be to find a way to communicate with the car itself for gathering of data and ultimately to provide real time feedback for congested streets, alternative routes or energy management tips. Last but not least, improving public transportation must be a priority in every form it can be found, and also implementing new ways like modular self driving cars, linked together for a common road portion, all for better management of transportation.

A Simple Market Analysis of Apple Inc.

I want to write now about a phenomena that is a part of our lives with or without our consent. I’m talking about Apple’s market strategy and how its intrusive policy of becoming one of the most exquisite brands have affected our perception on quality or innovation.

There’s no point in denying the direct correlation between Apple total revenue and the influence it has on the market of electronics and only by looking at the graph of sales we can say so. rehabilatation66

Further more, although the statistics that we found aren’t quite recent we can add that last week Apple inc. has sold 10 millions devices of its latest product, making it double than the original estimations. It would have been an even larger number but because of the major traffic that has had, the servers were down for a number of hours.

Now, in terms of data and what we are interested in, regarding the fact that all expectations were exceeded last week  we can say that the only thing that is still bringing the money to Apple is the iPhone division. The sales of iPad are decreasing and so are the ones of Macs and iPods but even so the stocks listed are increasing their values. This can be explained only by following the history of the company from the day they were first listed back in the 90’s until now  and observe that customer satisfaction and brand recognition as quality are one of the highest ever recorded.


In the end i’d like to say that even if Apple took its blows first with the competitive market of Android and Windows phones, later with the death of Steve Jobs and most recently with the flaws in the iCloud system that allowed hackers to break in and publish nude pictures of dozens of celebrities, marketing data reveals that the company is still in the top preferences of gadgets enthusiasts all over the world with a consolidated position over Samsung that will not decline too soon.

Elements of Statistical Learning Walkthrough


Data science can be a art, a art of identifying patterns and decisions before of even being taken, all this, with impressive accuracy. For our blog’s comeback I thought I should cover more the literary part of this science-art-craft and talk about some of the ground principles exposed in some of the finest books about data science.

In today’s article I will focus on a very well sturctured paper of Trevor Hastie, Professor of Mathematical Sciences at Stanford Univesity. His book, co-writed with Robert Tibshirani and Jerome Friedman is called The Elements of Statistical Learning: Data Mining, Inference, and Prediction and tries, if not, manages to give a detailed explanation to the challenge of understanding of how data led to development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics.  This paper  mainly observes the important fields and ideas in a common virtual framework.


The approach being mainly is statistical, the emphasis falls on concepts rather than  on mathematics. Many examples are given, with a easy-to-understand use of color graphics. It is a valuable resource for statisticians and everyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (better known as prediction) to unsupervised learning. Various topics are covered including neural networks, support vector machines, classification trees and boosting – the first comprehensive treatment of this topic in any book of this kind.

All in all I can certainly say that the presentation is not keened on mathematical aspects, and it  does not provide a deep analysis of why a specific method works. Instead, it gives you some intuition about what a method is trying to do. And this is the reason why i can say that I like this book so much. Without going into mathematical details of complicated algorithms, it summarizes all necessary (and really important) things one needs to know. Sometimes you understand it after doing a lot of research in this subject and coming back to the book. Nevertheless, the authors are great statisticians and certainly know what they are talking about!

PriceAlert a new sheriff in town!

If you ever tried to use a tool for price comparison between two or more products of different online shops you definitely came in contact with the limitations that were imposed by such platforms. Usually this kind of apps work by periodically crawling a number of sites and periodically updating a certain product/price table. This is not very useful for users who want to choose products from various shops  in different geographical areas or from smaller sites that haven’t been crawled by that certain app.

Now, maybe it’s a bit early to talk about the capabilities of this next product but it’s rounding up nice and can be a real help for finite consumers around the world and even for  market analysts, and the best part is that it’s completely free, and with no commercials either. PriceAlert wants to be a new solution for measuring prices from various sites. Until now there’s nothing amazing but the technology behind allows users to compare prices on any ecommerce platform all over the world, because unlike other similar platforms limited to a number of well defined shops it uses an algorithm that automatically extracts data like specifications of that certain product or of course price and availability.

For now, only a beta version is available but new features are programmed to come up starting very soon. Useful proprieties like an email alert when a certain price has changed or statistics over the change in price for a period of time will be available, not to mention the capability of exporting the data gathered into various useful formats like excel or CSV.

So we think there’s no reason for you not to check out this interesting new toy and maybe leave a review in case that you feel so. You can find it here at the address // and we hope that together we can bring one more interesting tool to the use of people who need it.

Get started with microformats

Microformats are small patterns that can be embedded into your HTML for easier recognition and representation of common published materials, like people, events, dates or tags. Even though the content of web is fully capable of automated processing, microformats simplify the process by attaching semantics and other so lead the way for a more professional automated processing. Many advantages can be found in favor of microformats but the most crucial are these ones.

By this time i should mention that microformats are a huge relief in web scraping by defining lightweight standards for declaring info in any web page. By doing so another concept of HTML5 is defined, Microdata. This lets you define custom variables and implement certain proprieties of them.

Now that you know what microformats are we should focus on the getting started part. A really useful, quick and detailed guide can be found here, and more complex task are also available. Now, the only thing left is to wish you good luck into implementing it .

Big Data and Data Mining Tools

Recently we have tested a Data Mining tool about which i want to write today. It is called Datameer and it’s a cloud app based on Hadoop so we don’t need to install anything on our computers but we must have the data that we want analyzed.

Step 1: Importing the data

To import any kind of data we must select the format of them:


Step 2: A small configuration

Some of which regard data format, others of the way to detect certain data types. This program tries to detect each column’s type being possible to add data types from a file:


Step 3: Some fine adjustments
If the program doesn’t detect the columns well we can do it manually.  A bad of this program is the fact that we can adjust data at this step only by removing of the recordings that won’t correspond to the type of data recently defined.


Step 4:Selecting the sample used for previsualisation


So this is all it is to be done for adding data into Datameer. Further on, an excel-like interface shows all the data .
Here we can find a few buttons responsible for the magic:

Column Dependency
Shows the relation between different columns and basically if a variable depend on other.

Using this we can group similar data.
All the discovering part is done by the program and we only have to specify the number of clusters that we want.

Decision Tree
Builds a decision tree based on the data.

These are all the important function of Datameer, but the true importance of this App relies not on the functions but on the ability of processing a huge quantity of data/

Processing of large text files

We, at TheWebMiner we have often the need of processing large text files, and when i say large i mean files of few hundreds of Megabytes or bigger. Out of all the text processing tools that we’ve tested so far we concluded that the best was Vim or gVim (the windows version of this famous editor).

Regulated Expressions

Another useful tool in file processing are regulated expressions, or, more simple RegEx. These expressions help us find, or find and replace pieces of text of a certain format, all being done automatically. By combining the two definitions we discover a new problem.

How do we use Regulated Expressions in Vim?

Vim has its own format for RegEx so we cannot use standard regulated expressions Of Vim but we have created and put to your disposal a convertor for this purpose. You can find the converter on our site ( , and we hope that this will come to your help.

Get ready to adapt your business to the future!




Recently while browsing the internet I’ve stumbled upon an article that captured my whole attention. Articles about how life is going to change in recent future and how technology becomes more and more a part of our lives are easy to find but in this article the accent is put on business development in recent and more distant future in all of its aspects, some of which i want to share with you.

First of all we want to remind that change is mandatory. Maintaining a certain business plan on a company for a long time can only lead to stagnation and in the end, to failure, as seen in the cases of Kodak or more recent Blackberry.

Having these examples we must not fear to embrace new promising technologies and stay ahead of competitors. Job market will evolve also creating new jobs that today sound simply weird like Nostalgist, Simplicity Expert or End of life Therapist. while also replacing more than 2 billion of today’s jobs through Automation.

Coming to the Advertisements sector we can say that revolution already became part of our lives by developing targeting algorithms meant to deliver the best commercials for us and culminating with knowing us better than our closest ones .

There are many to tell about how things are predicted to change in a few decades and we can’s stay ahead of everything but the least we can do is to be prepared, and optimists about change, after all, curiosity is our greatest gift.




Perspective Analytics is what really matters

I don’t know how much have you heard about Perspective Analytics because it is not as popular as Descriptive and Predictive Analytics but sure it has the power of changing how we treat Big Data.

By taking a blunt look at this situation we can say that Perspective Analytics is the new term to name the step from analytics to knowledge in the data to knowledge pyramid. Predictive analytics is the next step up in data reduction. It utilizes a variety of statistical, modeling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. As we know, big data imposes a huge amount of information the majority of which is useless, hence the necessity for this new service.

The purpose of analytics is not to tell you what is going to happen in the future but, because of its probabilistic nature, to inform you of what MIGHT happen, based on a a predictive model with two additional components: actionable data and a feedback system that tracks the outcome produced by the action taken.

This type new step/ type of analytics was first introduced in 2013 after the Descriptive Analytics was defined as the simplest class of analytics, one that allows you to condense big data into smaller, more useful nuggets of information, after which next step in reducing information is by applying a Predictive algorithm.

IBM’s vision is that descriptive analytics allows an understanding of what has happened, while advanced analytics, consisting of both predictive and prescriptive analytics, is where there is real impact on the decisions made by businesses every day


Latest mobile app trends

Mobile app market has changed over the years in many unexpected ways, but if there is something that everyone expected is that it is continuously growing. This rather new industry has expanded in every direction forcing the limits of creators imagination and of mobile physical capabilities.

TheWebMiner team has set up a series of graphics showing not only the evolution of mobile apps market for Android and iOS but also the most important trends to follow. The presentation can also be viewed here.

As a conclusion we can certainly say that app market has a steady position over mobile web market, always searching for new possibilities and expand areas few of which will go mainstream soon accustoming users with concepts like Internet of Things, or Mobile Payments in everyday situations. offers now structured list of all apps of Google Play and iTunes in any format suits your needs, with respect to any indicators on the site (you can find data here)!