We often scan entire landscape of mobile apps to produce useful data for research. Based on this activity, we have an idea of how many apps are in the world. Of course the values are estimated. More details about this calculation are detailed in below table: Continue reading
Most of the data we deliver is CSV type (comma separated values). Each row represents a value, and also different proprieties of a value are, you guessed it, comma separated. Of course that in order for this to have any meaning the order of this proprieties is kept for each row(each value).
Many of our clients require data in their own format and so, here, I want to write in detail about a case study:
Let there be the X client, asking for a database with certain companies of an area in order to use this into the company CRM and develop a marketing strategy.
Today we are glad to announce the collaboration between our company and WEBCentric. This is the most important transaction done by our company to this date. Below you can find the press release.
Only now at the end of 2015 we can realize the magnitude of a whole year and what we managed to accomplish in this time. For us, at TheWebMiner this year was a full one, marked by new experiences, connections, and most important, successful data extractions.
More than that, 2015 was a productive year. After doing an internship with several students from Bucharest Academy of Economic Studies we managed to expand our team with one member, a devoted programmer just as passionate about this science as any of us.
2015 came, and by now is almost gone and we can see that we’ve been mostly deceived by popular expectations from the media industry like hover boards, flying cars or laser guns.
It’s obvious that we all wish for such cool gadgets and we are eager to use them, but are we actually? in this matter data science has a word and establishes itself as an expression of people’s hidden wishes by underlining not what they say or what they wish but actually what people do in order to fulfill a goal. By now we determined that people love to read or watch SF but don’t actually want to experiment dangerous technologies that can be unstable and as much as The Jetsons inspired security things are not quite so, and from a darwinian point of view it’s the most normal thing to do.
By existing, I want to say a non empty EBS, a formatted device
It’s very simple:
1. Use lsblk command to view all attached devices:
[ec2-user ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvdf 202:80 0 22G 0 disk xvda1 202:1 0 8G 0 disk /
2. create dir and mount device:
sudo mkdir /mnt/my-data[ec2-user ~]$
sudo mount /dev/xvdf /mnt/my-data
I saw recently an event at an university in Romania (Universitatea Politehnica Bucuresti) that aims to help students to choose the subject for their degree thesis. At this event companies are invited to present themes in front of students. You will find below a short list of themes related to our industry:
1. Automatic website classification
Possible categories: e-commerce, company website, news/blog, other.
2. Detecting website structure (and representing as a tree)
E.g. The first level of an online store contains main categories, second level sub categories and n level product page. The entire website can be represented as a tree.
3. Logo detection on internet
When detecting logos on a website page there are multiple issues that might occur. For example: many logos in same image, scaled logos.
Please let us know if you want to develop one of the above themes, and we will help you with results of our research.
I always thought that companies have needs that are different from those of end users (see classification by target, B2C or B2B). And I think that this hypotheses is also true in internet area. These days I was busy with developing a TheWebMiner Filter and I want to talk in the following lines about internet search.
What is internet searching?
What I understand (and maybe many of you) by search is sorting. Google, Bing and other search engines try hard to find most relative page for our query and results are impressive. A colleague of mine told me that if you describe a movie scenario in a Google search, Google will find the Wikipedia page of movie. But this is an end user point of view.
Now we have a database with Pebble apps. You can find it on download page
Today we have a new toy . We have built a xml sitemap generator as Google Chrome extension. You can download from here: https://chrome.google.com/webstore/detail/thewebminer-sitemap-gener/gdljgjdcflclcapfnoejmbpodgajkbcd?hl=en