Tag Archives: Big Data

Automatic Data Processing

Most of the data we deliver is CSV type (comma separated values). Each row represents a value, and also different proprieties of a value are, you guessed it, comma separated. Of course that in order for this to have any meaning the order of this proprieties is kept for each row(each value).

Many of our clients require data in their own format and so, here, I want to write in detail about a case study:

Let there be the X client, asking for a database with certain companies of an area in order to use this into the company CRM and develop a marketing strategy.

Continue reading →

How Google search searches

In today’s post I want to get a little bit into the technology behind Google search, especially into the relational algorithm that orders the websites in a search. The idea came to me when i was researching new uses for databases used by great it corporations, like freebase.com. In this particular case freebase is a collection of structured data useful in any domain, the great thing different from other data bases being that users can submit their own researched data bases, transforming it in a wiki portal.

Continue reading →

Prediction Algorithms Are Catching Up

I’m coming back to writing after a short break because data science is always evolving. In this way, 2015 is promising to become a remarkable year for Big Data.

This month a new technology was introduced to public by researchers at the Ohio State University. More specific, a new privacy- preserving algorithm called Crowd-ML. This is a machine learning framework for crowdsensing systems that consists of a number of smart devices and a server. The efficiency of Crowd-ML consists in implementing sensing, learning and privacy mechanisms together, having the power to build classifiers or predictors of interest from crowdsensing data using processing capabilities of devices with formal privacy standards.

Continue reading →

Big Data as a part of your life

For those of you who aren’t convinced yet that Big Data infiltrates more and more in our daily lives i have one more proof that data science is the future of all sciences and has the power of interconnecting all branches of life. it’s not a fast process, replacing conventional methods of taking decisions with ones more analytically but certainly the small steps we are taking are headed in that direction.

Although the improvement was of only 12% Los Angeles took a giant leap last week when they announced that after a effort of more than 30 years of local authorities they managed to synchronize and adapt to traffic conditions all the red light signals in the 4 millions residents city, making this an experiment we will all have something too learn in the near future.

Research engineers in state DOT explained that synchronizing traffic light is only a small step in decongesting the traffic and this is why they are more focused on studying the habit of the drivers. Since the ’80s they anonymously gathered data, first through a network of cables and sensors placed into the roads, called loop wires and more recently, as the technology evolved, they used toll bots to calculate average speed of cars or wireless sensors to detect number of people in the car, by the number of phones inside and their behaviors.

The next step, traffic researchers say will be to find a way to communicate with the car itself for gathering of data and ultimately to provide real time feedback for congested streets, alternative routes or energy management tips. Last but not least, improving public transportation must be a priority in every form it can be found, and also implementing new ways like modular self driving cars, linked together for a common road portion, all for better management of transportation.

Big Data and Data Mining Tools

Recently we have tested a Data Mining tool about which i want to write today. It is called Datameer and it’s a cloud app based on Hadoop so we don’t need to install anything on our computers but we must have the data that we want analyzed.

Step 1: Importing the data

To import any kind of data we must select the format of them:

Step 2: A small configuration

Some of which regard data format, others of the way to detect certain data types. This program tries to detect each column’s type being possible to add data types from a file:

Step 3: Some fine adjustments
If the program doesn’t detect the columns well we can do it manually. A bad of this program is the fact that we can adjust data at this step only by removing of the recordings that won’t correspond to the type of data recently defined.

Step 4:Selecting the sample used for previsualisation

So this is all it is to be done for adding data into Datameer. Further on, an excel-like interface shows all the data .
Here we can find a few buttons responsible for the magic:

Column Dependency
Shows the relation between different columns and basically if a variable depend on other.

Clustering
Using this we can group similar data.
All the discovering part is done by the program and we only have to specify the number of clusters that we want.

Decision Tree
Builds a decision tree based on the data.

These are all the important function of Datameer, but the true importance of this App relies not on the functions but on the ability of processing a huge quantity of data/

Perspective Analytics is what really matters

I don’t know how much have you heard about Perspective Analytics because it is not as popular as Descriptive and Predictive Analytics but sure it has the power of changing how we treat Big Data.

By taking a blunt look at this situation we can say that Perspective Analytics is the new term to name the step from analytics to knowledge in the data to knowledge pyramid. Predictive analytics is the next step up in data reduction. It utilizes a variety of statistical, modeling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. As we know, big data imposes a huge amount of information the majority of which is useless, hence the necessity for this new service.

The purpose of analytics is not to tell you what is going to happen in the future but, because of its probabilistic nature, to inform you of what MIGHT happen, based on a a predictive model with two additional components: actionable data and a feedback system that tracks the outcome produced by the action taken.

This type new step/ type of analytics was first introduced in 2013 after the Descriptive Analytics was defined as the simplest class of analytics, one that allows you to condense big data into smaller, more useful nuggets of information, after which next step in reducing information is by applying a Predictive algorithm.

IBM’s vision is that descriptive analytics allows an understanding of what has happened, while advanced analytics, consisting of both predictive and prescriptive analytics, is where there is real impact on the decisions made by businesses every day

How to use regex in Vim?

We often need to process big text files (larger than 100 mb) and we discovered that best text editor for this is Vim and gVim (windows version). Also a powerful mode to process text automatically is to use regular expressions (also called RegEx).

Using RegEx in Vim

Vim doesn’t support standard RegEx, but we built a tool that converts standard regex to Vim regex. This tool it’s available here: RegEx to Vim.

We hope that is useful for you.

Experimentation is a Must

This word cloud represents the answer to the question of which areas are you going to be experimenting most heavily in the coming year, given by more than 600 representatives of different online companies.

Experimenting is one of the key tools of marketing engineering, and although the results of test-and-learn approaches are more widely appreciated, establishing the most appropriate culture is what holds most companies back, if we neglect the fear of failure.

Continue reading →

What are the most exciting opportunities for companies this year?

Hello everybody. Today I want to share with you a new research conducted by Adobe that we got our hands on these days. It’s actually a perspective view over the digital world and a briefing of what to expect from this field in 2014.If last year was about recognizing the importance of customer experience, this one is about actually doing something. For an optimal customer experience, various business functions and customer-facing touch points need to be working in harmony, from customer service and advertising to online user experience, content management and email messaging. For the question of which one area is the single most exciting opportunity for your organization (or for your client) in 2014 the responses were diverse but following a well defined pattern.

We know the number of company respondents (980) and the one of Agency respondents (1202) and from this we conclude that organizations need to ensure they have the right data, technology and culture to act as the foundation for a great customer experience, with a focus on multichannel marketing and campaign management also required to underpin a successful approach.

The mobile part, the second most exciting opportunity in the eyes of client-side respondents and first on the list for supply-side participants has a well earned position. About this we’ve already written in another post few weeks ago and is no surprise at all. Despite the importance of mobile and prominence of smartphones and tablets in our lives, many companies are still trying to work out how they optimize their websites for mobile, for example, whether they should go down the ‘responsive’ route or not.

A second figure that i’d like to discuss is not so much about the big picture and more about specific disciplines. The question is which three digital-related areas are the top priorities for your organization (or for your clients) in 2014?

It seems that marketers and digital professionals are clear on what the priorities are, and this has not changed markedly in the last year, knowing that the top five options are in exactly the same order as for last year’s survey, respectively content marketing, social media engagement, targeting and personalization, conversion rate optimization and mobile optimization.

Data Grants from Twitter

There is a lot of fuss these days because of the newest announcement made by Twitter on February 5. They encourage research institutions to apply until March 15 in what seem to be a scientific lottery, for a chance to the access of twitter’s data sets. Around 500 million Tweets are sent out each day and if they were to be scientifically quantified, studies like where the flu may hit, health-related information or events like ringing in the new year could be analyzed from a statistical point of view and outcomes could be predicted.

Twitter acknowledges the difficulties that researchers have to face when they have to collect data from the platform and therefore it named this project the Twitter Data Grants, aiming for a better connection between research institutions or academics and the data they need. Also, along with the data itself, the company will offer for the selected institutions the possibility of collaboration with their own engineers and researchers, all this, with the help of Gnip one of the most important Twitter’s certified data reseller partner.

TheWebMiner Blog

cloud web scraping tool

Tag Archives: Big Data

Automatic Data Processing

How Google search searches

Prediction Algorithms Are Catching Up

Big Data as a part of your life

Big Data and Data Mining Tools

Perspective Analytics is what really matters

How to use regex in Vim?

Using RegEx in Vim

Experimentation is a Must

What are the most exciting opportunities for companies this year?

Data Grants from Twitter