Skip to main content

Big data

We live in a digital age, where billions of digital devices emit high value "big data" every minute. These digital devices include smartphones, electric vehicles, smart watches, laptops, smart meters, smart cities etc.

Big data refers to extremely high volume data that may be structured or unstructured. Structured data typically comes in table format with clear labels while unstructured data comes in  non-table format (e.g. Twitter posts, Facebook posts, YouTube videos, Instagram posts etc.)
Sources of big data

Big data is characterized by:
1. High velocity
2. High volume
3. High variety
4. High veracity
5. High value


Learn more: What is big data?

Comments

Popular posts from this blog

How to transfer a gitlab repository into github

Method 1: Use the linux command line 1. Assume you have a gitlab repository called matric2016.git 2. Create a new working directory: $ mkdir myproj && cd myproj $ git clone gitlab@gitlab.com/Banzyme2/matric2016.git $ cd matric2016.git Make sure you create a github repository with the same name as the gitlab repository ,i.e. matric2016  3. Clone your project into github as follows:  $ git remote add github https://github.com/Banzyme/matric2016.git  $  git push --mirror github Method 2: Using the github dashboard repository import 1. Click "+" next to your github profile. Select import repository 2. Fill out the import form  as follows

PIP vs CONDA

Both are ' package managers' that can be used to install python packages such as numpy, matplotlib, seaborn etc. Although conda is more of an environmental manager  than it is a package manger. A package manager is simply a software tool used to automate the process of installing , updating  and removal of software packages(libraries). Conda PIP Can install non-python libraries Can only install python libraries Cross platform package manager Python package manager Install python packages in conda-environment Install python packages in any environment Leave any other disparities in the comments section below,so they can be added to the list.

Where to get data-sets to practice data science?

Data is the new science. Big data holds the answers. - Pat Gelsinger, CEO 1. Programmable Web  Description:   This is a site where you can obtain API's to extract data from some of the biggest sites on the internet. Link address:   API Directory Examples : Google maps API, Instagram API, Twitter API etc. 2. Postman API Development Description:   An online tool that you can use to access millions of APIs on the internet. You can also develop your own API if you happen to own a site. Link address:   API TOOL  Examples : Paypal API, Adobe API, Coursera API etc. 3. Facebook graph Description:   An online tool that you can use to access data about Facebook pages. Link address:   API  Examples : graph.facebook.com/youtube  - Access page data, e.g likes, number of posts etc. 4. APIGEE Description:   An online GUI tool that lets your extract and send data to various web platforms Link ad...