Skip to main content

Where to get data-sets to practice data science?

Data is the new science. Big data holds the answers. - Pat Gelsinger, CEO

1. Programmable Web 

  • Description:  This is a site where you can obtain API's to extract data from some of the biggest sites on the internet.
  • Link address: API Directory
  • Examples: Google maps API, Instagram API, Twitter API etc.
2. Postman API Development

  • Description:  An online tool that you can use to access millions of APIs on the internet. You can also develop your own API if you happen to own a site.
  • Link address: API TOOL 
  • Examples: Paypal API, Adobe API, Coursera API etc.
3. Facebook graph
  • Description:  An online tool that you can use to access data about Facebook pages.
  • Link address: API 
  • Examples: graph.facebook.com/youtube  - Access page data, e.g likes, number of posts etc.
4. APIGEE
  • Description:  An online GUI tool that lets your extract and send data to various web platforms
  • Link address: GET & POST TO API
  • Examples: Bing API, Github API, Heroku API etc.
5. Kaggle
  • Description:  A data science and machine learning community where you can obtain clean datasets from various sources.
  • Link address: Datasets
  • Examples: Titatnic dataset, South African Crime dataset, Trending youtube video statistics etc.


Comments

Popular posts from this blog

How to transfer a gitlab repository into github

Method 1: Use the linux command line 1. Assume you have a gitlab repository called matric2016.git 2. Create a new working directory: $ mkdir myproj && cd myproj $ git clone gitlab@gitlab.com/Banzyme2/matric2016.git $ cd matric2016.git Make sure you create a github repository with the same name as the gitlab repository ,i.e. matric2016  3. Clone your project into github as follows:  $ git remote add github https://github.com/Banzyme/matric2016.git  $  git push --mirror github Method 2: Using the github dashboard repository import 1. Click "+" next to your github profile. Select import repository 2. Fill out the import form  as follows

PIP vs CONDA

Both are ' package managers' that can be used to install python packages such as numpy, matplotlib, seaborn etc. Although conda is more of an environmental manager  than it is a package manger. A package manager is simply a software tool used to automate the process of installing , updating  and removal of software packages(libraries). Conda PIP Can install non-python libraries Can only install python libraries Cross platform package manager Python package manager Install python packages in conda-environment Install python packages in any environment Leave any other disparities in the comments section below,so they can be added to the list.