Search
Generic filters
Search Results for:
Don’t let demons in your data analytics strategy fuel your halloween scare
Category: Big Data Author: Vishal Kumar Date: 5 years ago Comments: 0

getanalyticsdone_noac_horror_580_252

When Halloween is just around the corner and the festivities smothered with zombies and goblins, the last thing data science professionals seek to see are spooky stuff in data and analytics. Businesses invests a great deal of resources to get the data analytics game right. A smooth journey from Descriptive to Predictive to Prescriptive. While we embrace the dark side of this festival, here are some spooky things we should never see in our data analytics journey.

Scare of unclean data:
Unclean data leads to unclean analysis and impacts data integrity which makes trust on data all the more spooky and gives rise to Frankenstein of data analysis. So, build a proper definition structure for clean/unclean data.

Scare of analysis paralysis:
Feeling like a zombie with analysis after analysis after analysis? You are not alone! Most of us suffer from it and those scary feelings demotivates us to not be exhaustive in our analysis to help save us from analysis paralysis. So, build a systematic approach to safeguard from analysis paralysis.

Scare of bad models:
Oh yeah, we have all heard those stories of models going wrong and producing fault prediction costing professionals money and in some cases jobs. So, build a good audit strategy to keep your models free of stink and corrosion.

Scare of zombie modelers:
Yes, people! We represent a serious clan of scary individuals who are responsible for enterprise data analytics. A bad non focused modeler could lead the effort, models, data wrangling etc to a scary town. Costing great deal of business and market trust. Such a scare could be easily avoided by creating proper checks and balances within talent to audit the process and making sure it is relevant.

Scare of disillusioned boss:
Yes, we all know who we are talking about.. those gut based decision makers who avoid data in their decision making and always undermine the investment in processes to help business better manage the data. Awareness is a great way to solve the problem.

This Halloween, make sure your data analytics demons stay checked and quarantined and not go for a ride to scare the town.


FREE Big Data sets (Lists and Links)
Category: Big Data,Big Data Author: Vishal Kumar Date: 5 years ago Comments: 2

Nosql-database-dedicated-server
We did a brief research on some good resources for available data sets. We’ve captured some great list of data sets that one could access and play with.

* Please let us know any missing list on comment section.

  1. http://adni.loni.ucla.edu/
  2. http://archive.ics.uci.edu/ml/
  3. http://aws.amazon.com/datasets
  4. http://baseball1.com/statistics/
  5. http://blog.freebase.com/2010/11/10/google-refine-previously-freebase-gridworks-2-0-announced/
  6. http://books.google.com/ngrams/
  7. http://build.kiva.org/
  8. http://ckan.net/
  9. http://ckan.org[ComprehensiveKnowledgeArchiveNetwork]
  10. http://cleaneval.sigwac.org.uk/
  11. http://clic.cimec.unitn.it/marco/research.html
  12. http://code.google.com/p/google-refine/
  13. http://corpus.leeds.ac.uk/internet.html#description
  14. http://crawdad.org/
  15. http://data.austintexas.gov/
  16. http://data.cityofchicago.org/
  17. http://data.gov.bc.ca/
  18. http://data.gov.uk/
  19. http://data.govloop.com/
  20. http://data.medicare.gov/
  21. http://data.reegle.info/
  22. http://data.seattle.gov/
  23. http://data.sfgov.org/
  24. http://data.sunlightlabs.com/
  25. http://data.wien.gv.at/
  26. http://databib.org/
  27. http://datacite.org/
  28. http://datamob.org/
  29. http://datasets.reddit.com/
  30. http://daten.berlin.de/
  31. http://dati.trentino.it/
  32. http://dbpedia.org/
  33. http://del.icio.us/pskomoroch/dataset
  34. http://developer.yahoo.com/geo/geoplanet/data/
  35. http://download.geonames.org/export/dump/
  36. http://econ.worldbank.org/datasets
  37. http://eeg.pl/epi
  38. http://en.wikipedia.org/wiki/Wikipedia:Database_download
  39. http://enigma.io/
  40. http://factfinder.census.gov/servlet/DatasetMainPageServlet
  41. http://figshare.com/
  42. http://freebase.com/
  43. http://ftp.ncbi.nih.gov/
  44. http://gettingpastgo.socrata.com/
  45. http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
  46. http://imat-relpred.yandex.ru/en/datasets
  47. http://infochimp.info/ics/data/ripd/www-personal.umich.edu/~mejn/netdata/
  48. http://infochimps.com/collections/datamob
  49. http://infochimps.org/datasets
  50. http://knoema.com/
  51. http://lib.stat.cmu.edu/datasets/
  52. http://linkeddata.org/
  53. http://liste.sslmit.unibo.it/pipermail/sigwac/2007-November/000041.html
  54. http://medihal.archives-ouvertes.fr/
  55. http://news.ycombinator.com/item?id=1242029
  56. http://opendata.reddit.com/
  57. http://openfmri.org/
  58. http://public.resource.org/
  59. http://quandl.com/
  60. http://radar.oreilly.com/2010/03/open-data-pointers.html
  61. http://rechercheisidore.fr/
  62. http://reddit.com/r/datasets
  63. http://research.stlouisfed.org/fred2/
  64. http://simplegeo.com/
  65. http://snap.stanford.edu/data/index.html
  66. http://sslmit.unibo.it/~baroni/bootcat.html
  67. http://thedatahub.org
  68. http://theinfo.org/
  69. http://timetric.com/dataset/exchange_rates_forex_europe/
  70. http://timetric.com/public-data/
  71. http://unstats.un.org/
  72. http://usgovxml.com/
  73. http://wacky.sslmit.unibo.it/doku.php?id=
  74. http://wiki.dbpedia.org/
  75. http://wiki.openstreetmap.org/wiki/Planet.osm
  76. http://www.acrin.org/
  77. http://www.aishub.net/
  78. http://www.archive-it.org/public/all_collections
  79. http://www.archives.gov/research/alic/tools/online-databases.html
  80. http://www.bls.gov/
  81. http://www.census.gov/geo/www/tiger/tgrshp2010/tgrshp2010.html
  82. http://www.ckan.net/tag/read/size-large
  83. http://www.correlatesofwar.org/Datasets.htm
  84. http://www.crunchbase.com/
  85. http://www.dados.gov.pt/pt/catalogodados/catalogodados.aspx
  86. http://www.daniel-lemire.com/blog/data-for-data-mining/
  87. http://www.dartmouthatlas.org/
  88. http://www.data.gov/
  89. http://www.datakc.org/
  90. http://www.datawrangling.com/some-datasets-available-on-the-web
  91. http://www.datawrangling.com/some-datasets-available-on-the-web.html
  92. http://www.dati.gov.it/
  93. http://www.delicious.com/jbaldwinconnect/DataSets
  94. http://www.diggingintodata.org/Repositories/tabid/167/Default.aspx
  95. http://www.drni.de/wac-tk/index.php/Documentation
  96. http://www.faa.gov/data_research/
  97. http://www.factual.com/
  98. http://www.freebase.com/
  99. http://www.gapminder.org/data/
  100. http://www.google.com/publicdata/directory
  101. http://www.growmeme.com/overview
  102. http://www.gtfs-data-exchange.com/
  103. http://www.guardian.co.uk/news/datablog
  104. http://www.icpsr.umich.edu/icpsrweb/CPES/
  105. http://www.imdb.com/interfaces
  106. http://www.infochimps.com/
  107. http://www.kaggle.com/
  108. http://www.kdnuggets.com/datasets/index.html
  109. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13
  110. http://www.marinetraffic.com/ais/
  111. http://www.nationalarchives.gov.uk/doc/open-government-licence/open-government-licence.htm
  112. http://www.naturalearthdata.com/
  113. http://www.nitrc.org/
  114. http://www.nyc.gov/html/datamine/html/home/home.shtml
  115. http://www.oasis-brains.org/
  116. http://www.ordnancesurvey.co.uk/oswebsite/opendata/
  117. http://www.philwhln.com/how-to-get-experience-working-with-large-datasets
  118. http://www.quantlet.org/mdbase/
  119. http://www.qunb.com/
  120. http://www.quora.com/Where-can-I-get-large-datasets-open-to-the-public?q=dataset
  121. http://www.reddit.com/r/datasets/
  122. http://www.reddit.com/r/opendata
  123. http://www.trustlet.org/wiki/Repositories_of_datasets
  124. http://www2.jpl.nasa.gov/srtm
  125. https://datamarket.azure.com/
  126. https://news.ycombinator.com/user?id=mindcrime
  127. https://pslcdatashop.web.cmu.edu/
  128. https://wist.echo.nasa.gov/~wist/api/imswelcome/
  129. https://www.cia.gov/library/publications/download/

Data Analytics MOOC Courses (MOOC List)
Category: Analytics,Big Data Author: Vishal Kumar Date: 5 years ago Comments: 1
CGHPWvNWEAAePk7
Self Paced

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.

Self Paced
 This FREE MOOC (Massive Open Online Course) investigates the use of clouds running data analytics collaboratively for processing Big Data to solve problems in Big Data Applications and Analytics. Case studies such as Netflix recommender systems, Genomic data, and more will be discussed.
Self Paced
 A real Caltech course, not a watered-down version. This is an introductory course in machine learning (ML) that covers the basic theory, algorithms, and applications.
Self Paced Course – Start anytime
 Data Manipulation and Retrieval. In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications. Some data scientists spend the bulk of their time doing this!
Oct 15th 2015
 Aprenda a utilizar as novas metodologias de big data para a análise de grandes bancos de dados e para a melhoria dos processos de tomada de decisão.
Oct 12th 2015

Learn how you can predict customer demand and preferences by using the data that is all around you.

Marketing & Communication

Sep 15th 2015
 Learn various methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from high content molecular and phenotype profiling of human cells.
Aug 24th 2015
 Le MOOC « Fondamentaux pour le big data » permet d’acquérir efficacement le niveau prérequis en informatique et en statistiques pour suivre des formations dans le domaine du big data.
Aug 25th 2015

Learn why and how knowledge management and Big Data are vital to the new business era.

Business & Management, Social Sciences, Statistics & Data Analysis

Aug 3rd 2015

Learn how to use Hadoop technologies in Microsoft Azure HDInsight to process big data in this five week, hands-on course.

Computer Science: Programming & Software Engineering

Jul 1st 2015

Learn how and when to use key methods for educational data mining and learning analytics on large-scale educational data.

Education, Statistics & Data Analysis

Jun 23rd 2015

Big Data is an extraordinary knowledge revolution that is sweeping through business, academia, government, healthcare, and everyday life. It enables us to provide a healthier life for our children, ensure safety and independence for older people, conserve precious resources like water and energy, and peer into our own individual, genetic makeup. The term “Big Data” describes the accumulation and analysis of vast amounts of information. But Big Data is much more than big data. It’s also the ability to extract meaning: to sort through masses of numbers and find the hidden patterns and unexpected correlations.
You’ll learn from use cases what it takes to extract that value from Big Data, and…

Business & Management, Information, Technology, and Design, Marketing & Communication, Statistics & Data Analysis

Jun 1st 2015

Learn how to apply data science techniques using parallel programming in Apache Spark to explore big (and small) data.

Computer Science: Programming & Software Engineering

Apr 20th 2015
 Join us to explore how the vast amounts of data generated today can help us understand and even predict how humans behave.
May 12th 2015
 Learn about the value, opportunity, and insights that Big Data provides. Get introduced to the Federation Business Data Lake solution to leverage the full power of big data to drive major business strategies.
Oct 28th 2014

This course follows on from Data Mining with Weka and provides a deeper account of data mining tools and techniques. Again the emphasis is on principles and practical data mining using Weka, rather than mathematical theory or advanced details of particular algorithms.

Computer Science: Theory, Information, Technology, and Design, Mathematics, Statistics & Data Analysis

Sep 2nd 2014
 This is an intensive, advanced summer school (in the sense used by scientists) in some of the methods of computational, data-intensive science. It covers a variety of topics from applied computer science and engineering, and statistics, and it requires a strong background in computing, statistics, and data-intensive research.

The 37 best tools for data visualization
Category: Big Data Author: admin Date: 5 years ago Comments: 3

preview

Creating charts and info graphics can be time-consuming. But these tools make it easier.

It’s often said that data is the new world currency, and the web is the exchange bureau through which it’s traded. As consumers, we’re positively swimming in data; it’s everywhere from labels on food packaging design to World Health Organisation reports. As a result, for the designer it’s becoming increasingly difficult to present data in a way that stands out from the mass of competing data streams.

One of the best ways to get your message across is to use a visualization to quickly draw attention to the key messages, and by presenting data visually it’s also possible to uncover surprising patterns and observations that wouldn’t be apparent from looking at stats alone.

Not a web designer or developer? You may prefer free tools for creating infographics.

As author, data journalist and information designer David McCandless said in his TED talk: “By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful.”

There are many different ways of telling a story, but everything starts with an idea. So to help you get started we’ve rounded up some of the most awesome data visualization tools available on the web.

01. Dygraphs

Help visitors explore dense data sets with JavaScript library Dygraphs

Dygraphs is a fast, flexible open source JavaScript charting library that allows users to explore and interpret dense data sets. It’s highly customizable, works in all major browsers, and you can even pinch to zoom on mobile and tablet devices.

02. ZingChart

ZingChart lets you create HTML5 Canvas charts and more

ZingChart is a JavaScript charting library and feature-rich API set that lets you build interactive Flash or HTML5 charts. It offer over 100 chart types to fit your data.

03. InstantAtlas

InstantAtlas enables you to create highly engaging visualisations around map data

If you’re looking for a data viz tool with mapping, InstantAtlas is worth checking out. This tool enables you to create highly-interactive dynamic and profile reports that combine statistics and map data to create engaging data visualizations.

04. Timeline

 Timeline
Timeline creates beautiful interactive visualizations

Timeline is a fantastic widget which renders a beautiful interactive timeline that responds to the user’s mouse, making it easy to create advanced timelines that convey a lot of information in a compressed space.

Each element can be clicked to reveal more in-depth information, making this a great way to give a big-picture view while still providing full detail.

05. Exhibit

 Exhibit
Exhibit makes data visualization a doddle

Developed by MIT, and fully open-source, Exhibit makes it easy to create interactive maps, and other data-based visualizations that are orientated towards teaching or static/historical based data sets, such as flags pinned to countries, or birth-places of famous people.

06. Modest Maps

 Modest Maps
Integrate and develop interactive maps within your site with this cool tool

Modest Maps is a lightweight, simple mapping tool for web designers that makes it easy to integrate and develop interactive maps within your site, using them as a data visualization tool.

The API is easy to get to grips with, and offers a useful number of hooks for adding your own interaction code, making it a good choice for designers looking to fully customise their user’s experience to match their website or web app. The basic library can also be extended with additional plugins, adding to its core functionality and offering some very useful data integration options.

07. Leaflet

 Leaflet
Use OpenStreetMap data and integrate data visualisation in an HTML5/CSS3 wrapper

Another mapping tool, Leaflet makes it easy to use OpenStreetMap data and integrate fully interactive data visualisation in an HTML5/CSS3 wrapper.

The core library itself is very small, but there are a wide range of plugins available that extend the functionality with specialist functionality such as animated markers, masks and heatmaps. Perfect for any project where you need to show data overlaid on a geographical projection (including unusual projections!).

08. WolframAlpha

 Wolfram Alpha
Wolfram Alpha is excellent at creating charts

Billed as a “computational knowledge engine”, the Google rival WolframAlpha is really good at intelligently displaying charts in response to data queries without the need for any configuration. If you’re using publically available data, this offers a simple widget builder to make it really simple to get visualizations on your site.

09. Visual.ly

 Visual.ly
Visual.ly makes data visualization as simple as it can be

Visual.ly is a combined gallery and infographic generation tool. It offers a simple toolset for building stunning data representations, as well as a platform to share your creations. This goes beyond pure data visualisation, but if you want to create something that stands on its own, it’s a fantastic resource and an info-junkie’s dream come true!

10. Visualize Free

 Visualize Free
Make visualizations for free!

Visualize Free is a hosted tool that allows you to use publicly available datasets, or upload your own, and build interactive visualizations to illustrate the data. The visualizations go well beyond simple charts, and the service is completely free plus while development work requires Flash, output can be done through HTML5.

11. Better World Flux

 Better World Flux
Making the ugly beautiful – that’s Better World Flux

Orientated towards making positive change to the world, Better World Flux has some lovely visualizations of some pretty depressing data. It would be very useful, for example, if you were writing an article about world poverty, child undernourishment or access to clean water. This tool doesn’t allow you to upload your own data, but does offer a rich interactive output.

12. FusionCharts

FusionCharts Suite XT
A comprehensive JavaScript/HTML5 charting solution for your data visualization needs

FusionCharts Suite XT brings you 90+ charts and gauges, 965 data-driven maps, and ready-made business dashboards and demos. FusionCharts comes with extensive JavaScript API that makes it easy to integrate it with any AJAX application or JavaScript framework. These charts, maps and dashboards are highly interactive, customizable and work across all devices and platforms. They also have a comparison of the top JavaScript charting libraries which is worth checking out.

13. jqPlot

 jQPlot
jqPlot is a nice solution for line and point charts

Another jQuery plugin, jqPlot is a nice solution for line and point charts. It comes with a few nice additional features such as the ability to generate trend lines automatically, and interactive points that can be adjusted by the website visitor, updating the dataset accordingly.

14. Dipity

 Dipity
Dipity has free and premium versions to suit your needs

Dipity allows you to create rich interactive timelines and embed them on your website. It offers a free version and a premium product, with the usual restrictions and limitations present. The timelines it outputs are beautiful and fully customisable, and are very easy to embed directly into your page.

15. Many Eyes

 Many Eyes
Many Eyes was developed by IBM

Developed by IBM, Many Eyes allows you to quickly build visualizations from publically available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation. This is another great example of a big company supporting research and sharing the results openly.

16. D3.js

 D3.js
You can render some amazing diagrams with D3

D3.js is a JavaScript library that uses HTML, SVG, and CSS to render some amazing diagrams and charts from a variety of data sources. This library, more than most, is capable of some seriously advanced visualizations with complex data sets. It’s open source, and uses web standards so is very accessible. It also includes some fantastic user interaction support.

17. JavaScript InfoVis Toolkit

 JavaScript InfoVis Toolkit
JavaScript InfoVis Toolkit includes a handy modular structure

A fantastic library written by Nicolas Belmonte, the JavaScript InfoVis Toolkit includes a modular structure, allowing you to only force visitors to download what’s absolutely necessary to display your chosen data visualizations. This library has a number of unique styles and swish animation effects, and is free to use (although donations are encouraged).

18. jpGraph

 jpGraph
jpGraph is a PHP-based data visualization tool

If you need to generate charts and graphs server-side, jpGraph offers a PHP-based solution with a wide range of chart types. It’s free for non-commercial use, and features extensive documentation. By rendering on the server, this is guaranteed to provide a consistent visual output, albeit at the expense of interactivity and accessibility.

19. Highcharts

 Highcharts
Highcharts has a huge range of options available

Highcharts is a JavaScript charting library with a huge range of chart options available. The output is rendered using SVG in modern browsers and VML in Internet Explorer. The charts are beautifully animated into view automatically, and the framework also supports live data streams. It’s free to download and use non-commercially (and licensable for commercial use). You can also play with the extensive demos using JSFiddle.

20. Google Charts

 Google Charts
Google Charts has an excellent selection of tools available

The seminal charting solution for much of the web, Google Charts is highly flexible and has an excellent set of developer tools behind it. It’s an especially useful tool for specialist visualizations such as geocharts and gauges, and it also includes built-in animation and user interaction controls.

21. Excel

 Excel
It isn’t graphically flexible, but Excel is a good way to explore data: for example, by creating ‘heat maps’ like this one

You can actually do some pretty complex things with Excel, from ‘heat maps’ of cells to scatter plots. As an entry-level tool, it can be a good way of quickly exploring data, or creating visualizations for internal use, but the limited default set of colours, lines and styles make it difficult to create graphics that would be usable in a professional publication or website. Nevertheless, as a means of rapidly communicating ideas, Excel should be part of your toolbox.

Excel comes as part of the commercial Microsoft Office suite, so if you don’t have access to it, Google’s spreadsheets – part ofGoogle Docs and Google Drive – can do many of the same things. Google ‘eats its own dog food’, so the spreadsheet can generate the same charts as the Google Chart API. This will get your familiar with what is possible before stepping off and using the API directly for your own projects.

22. CSV/JSON

CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) aren’t actual visualization tools, but they are common formats for data. You’ll need to understand their structures and how to get data in or out of them.

23. Crossfilter

 Crossfilter
Crossfilter in action: by restricting the input range on any one chart, data is affected everywhere. This is a great tool for dashboards or other interactive tools with large volumes of data behind them

As we build more complex tools to enable clients to wade through their data, we are starting to create graphs and charts that double as interactive GUI widgets. JavaScript library Crossfilter can be both of these. It displays data, but at the same time, you can restrict the range of that data and see other linked charts react.

24. Tangle

 Tangle
Tangle creates complex interactive graphics. Pulling on any one of the knobs affects data throughout all of the linked charts. This creates a real-time feedback loop, enabling you to understand complex equations in a more intuitive way

The line between content and control blurs even further with Tangle. When you are trying to describe a complex interaction or equation, letting the reader tweak the input values and see the outcome for themselves provides both a sense of control and a powerful way to explore data. JavaScript library Tangle is a set of tools to do just this.

Dragging on variables enables you to increase or decrease their values and see an accompanying chart update automatically. The results are only just short of magical.

25. Polymaps

 Polymaps
Aimed more at specialist data visualisers, the Polymaps library creates image and vector-tiled maps using SVG

Polymaps is a mapping library that is aimed squarely at a data visualization audience. Offering a unique approach to styling the the maps it creates, analagous to CSS selectors, it’s a great resource to know about.

26. OpenLayers

 OpenLayers
It isn’t easy to master, but OpenLayers is arguably the most complete, robust mapping solution discussed here

OpenLayers is probably the most robust of these mapping libraries. The documentation isn’t great and the learning curve is steep, but for certain tasks nothing else can compete. When you need a very specific tool no other library provides, OpenLayers is always there.

27. Kartograph

 Kartograph
Kartograph’s projections breathe new life into our standard slippy maps

Kartograph’s tag line is ‘rethink mapping’ and that is exactly what its developers are doing. We’re all used to the Mercator projection, but Kartograph brings far more choices to the table. If you aren’t working with worldwide data, and can place your map in a defined box, Kartograph has the options you need to stand out from the crowd.

28. CartoDB

 CartoDB
CartoDB provides an unparalleled way to combine maps and tabular data to create visualisations

CartoDB is a must-know site. The ease with which you can combine tabular data with maps is second to none. For example, you can feed in a CSV file of address strings and it will convert them to latitudes and longitudes and plot them on a map, but there are many other users. It’s free for up to five tables; after that, there are monthly pricing plans.

29. Processing

 Processing
Processing provides a cross-platform environment for creating images, animations, and interactions

Processing has become the poster child for interactive visualizations. It enables you to write much simpler code which is in turn compiled into Java.

There is also a Processing.js project to make it easier for websites to use Processing without Java applets, plus a port to Objective-C so you can use it on iOS. It is a desktop application, but can be run on all platforms, and given that it is now several years old, there are plenty of examples and code from the community.

30. NodeBox

 NodeBox
NodeBox is a quick, easy way for Python-savvy developers to create 2D visualisations

NodeBox is an OS X application for creating 2D graphics and visualizations. You need to know and understand Python code, but beyond that it’s a quick and easy way to tweak variables and see results instantly. It’s similar to Processing, but without all the interactivity.

31. R

 R
A powerful free software environment for statistical computing and graphics, R is the most complex of the tools listed here

How many other pieces of software have an entire search enginededicated to them? A statistical package used to parse large data sets, R is a very complex tool, and one that takes a while to understand, but has a strong community and package library, with more and more being produced.

The learning curve is one of the steepest of any of these tools listed here, but you must be comfortable using it if you want to get to this level.

32. Weka

 Weka
A collection of machine-learning algorithms for data-mining tasks, Weka is a powerful way to explore data

When you get deeper into being a data scientist, you will need to expand your capabilities from just creating visualizations to data mining. Weka is a good tool for classifying and clustering data based on various attributes – both powerful ways to explore data – but it also has the ability to generate simple plots.

33. Gephi

 Gelphi
Gephi in action. Coloured regions represent clusters of data that the system is guessing are similar

When people talk about relatedness, social graphs and co-relations, they are really talking about how two nodes are related to one another relative to the other nodes in a network. The nodes in question could be people in a company, words in a document or passes in a football game, but the maths is the same.

Gephi, a graph-based visualiser and data explorer, can not only crunch large data sets and produce beautiful visualizations, but also allows you to clean and sort the data. It’s a very niche use case and a complex piece of software, but it puts you ahead of anyone else in the field who doesn’t know about this gem.

34. iCharts

 iCharts
iCharts can have interactive elements, and you can pull in data from Google Docs

The iCharts service provides a hosted solution for creating and presenting compelling charts for inclusion on your website. There are many different chart types available, and each is fully customisable to suit the subject matter and colour scheme of your site.

Charts can have interactive elements, and can pull data from Google Docs, Excel spreadsheets and other sources. The free account lets you create basic charts, while you can pay to upgrade for additional features and branding-free options.

35. Flot

 Flot
Create animated visualisations with this jQuery plugin

Flot is a specialised plotting library for jQuery, but it has many handy features and crucially works across all common browsers including Internet Explorer 6. Data can be animated and, because it’s a jQuery plugin, you can fully control all the aspects of animation, presentation and user interaction. This does mean that you need to be familiar with (and comfortable with) jQuery, but if that’s the case, this makes a great option for including interactive charts on your website.

36. Raphaël

 Raphael
This handy JavaScript library offers a range of data visualisation options

This handy JavaScript library offers a wide range of data visualization options which are rendered using SVG. This makes for a flexible approach that can easily be integrated within your own web site/app code, and is limited only by your own imagination.

That said, it’s a bit more hands-on than some of the other tools featured here (a victim of being so flexible), so unless you’re a hardcore coder, you might want to check out some of the more point-and-click orientated options first!

37. jQuery Visualize

 JQuery Visualise
jQuery Visualize Plugin is an open source charting plugin

Written by the team behind jQuery’s ThemeRoller and jQuery UI websites, jQuery Visualize Plugin is an open source charting plugin for jQuery that uses HTML Canvas to draw a number of different chart types. One of the key features of this plugin is its focus on achieving ARIA support, making it friendly to screen-readers. It’s free to download from this page on GitHub.

Further reading

  • A great Tumblr blog for visualization examples and inspiration:vizualize.tumblr.com
  • Nicholas Felton’s annual reports are now infamous, but he also has a Tumblr blog of great things he finds.
  • From the guy who helped bring Processing into the world:benfry.com/writing
  • Stamen Design is always creating interesting projects:stamen.com
  • Eyeo Festival brings some of the greatest minds in data visualization together in one place, and you can watch the videos online.

Brian Suda is a master informatician and author of Designing with Data, a practical guide to data visualisation.

Originally posted via “The 37 best tools for data visualization”

 


10 US Graduate Education Opportunities for Aspiring Analytics Professionals
Category: Big Data,Technology Author: Jasmine W. Gordon Date: 5 years ago Comments: 0
image credit: sean macentee via flickr/cc
image credit: sean macentee via flickr/cc

Big data education has become big business. With projected massive talent gaps in the field of analytics, many professionals with and without quantitative backgrounds are choosing to complete graduate-level training. In response to this demand, universities worldwide are opening the doors to new programs. According to The New York Times, some dozens of academically-focused graduate programs have been developed quickly in response to both “excitement” and the prospect of six-figure salaries for program graduates.

This list is not an exhaustive compilation of all accredited and non-accredited graduate programs for aspiring data scientists in the US, but it’s a compilation of some of the dozens of options available for prospective students. Entrance requirements can vary significantly between programs. However, some effort has been made to denote whether programs listed accept students without a strong background in mathematics, programming, or statistics. Prospective students are always encouraged to contact an admissions representative at a program of interest directly for more information on entrance requirements.

Please note that programs are not ranked according to perceived prestige. Additionally, representatives or affiliates of programs not listed in this compilation are encouraged to share their program details and links in the comments.

UNC: Department of Statistics and Operations Research (STOR)

UNC’s STOR program has long been one of the most prestigious players in the statistics space, and remains a training ground for tomorrow’s data scientists. Students interested in research may find rich opportunities to collaborate in projects related to genomics, biological modeling, financial mathematics, stochastic processes, operations and much more.

Admission requirements include a strong background in mathematical, computer and/or decision sciences. A “summer boot camp” program is offered for both entering Master’s and Doctoral-level students, to provide solid groundwork in analysis and linear algebra.

  • Tracks: 1 Bachelor’s Degree, 3 Master’s of Science and 3 PhD options
  • Residency Required: Yes

University of Washington: Data Science

UW’s Professional & Continuing Education options include a new Data Science Certificate, which aims to equip students with the “fundamental tools, techniques, and practical experience” to wrangle very large volumes of data. Coursework includes a focus on applied statistics, machine learning, natural language processing, and more. Additional related graduate programs in Big Data Technologies, Cloud Management, Data Visualization, Machine Learning, and Statistical Analysis are also offered.

According to the program website, individuals currently working in quantitative roles are encouraged to apply. Example backgrounds listed include database administrators, software engineers, data analysts, statisticians, and researchers.

  • Tracks: Certificate in Data Science
  • Residency Required: No; online coursework possible. On-site classes in Seattle and Bellevue, WA.

University of California Irvine Extension Campus: Data Science

UCI offers a number of information technology focused programs through online coursework, including a certificate of Data Science. Students are required to take a minimum of 15 graduate units, 9 of which can consist of specialty focuses in R Programming, Data Preparation, Java, Object-Oriented Design, and other topics.

The program also appears to be a solid choice for individuals who lack a traditionally analytical background for data scientists. The website states it’s “intended” for “professionals in a “wide variety” of job titles and industries.

  • Tracks: Certificate of Data Science
  • Residency Required: No; online courses

Coursera: Data Science

John Hopkin’s University and SwiftKey offer a course track in Data Science via Coursera, one of the predominant players in the massive open online courses (MOOC) space. Students will gain experience in R Programming, Data Cleaning, Analysis, Statistical Inference, and much more.

With a total cost of just $470 for all ten of the courses listed, it’s significantly cheaper than many other options available to aspiring analytics professionals. While some Universities may choose to grant credit for Coursera Certifications, it’s recommended that prospective students inquire directly with organizations about credit transferability and corporate reimbursement opportunities before enrolling.

The program FAQ specify that it’s recommended for students to have some programming experience in “any language,” as well as a “working knowledge” of algebra, though Calculus and Linear Algebra are not required for successful completion of the program.

  • Tracks: Data Science Specialization
  • Residency Required: No

New York University: Center for Data Science

New York University’s Center for Data Science represents a “university-wide initiative” to furnish skilled analytics researchers and professionals. Combining academics from some 18 centers across NYU’s campus, the programs offered range greatly in specialization. In addition to a Master’s of Science in Data Science, students can opt to dive deep into programs focused on business analytics, scientific computing, digital marketing, and other specialties.

Admission requirements can vary significantly according to the selected program; a PhD in Biostatistics may have much different entrance requirements than an Advanced Certificate in Applied Urban Science and Informatics

  • Tracks: 9 Master’s, 4 PhDs, and 2 Graduate Certificate options. Additionally, data science coursework may be taken as electives by students in some 19 related graduate programs.
  • Residency Required: Yes

Lipscomb University: Data Science

Lipscomb University in Nashville, TN offers graduate coursework for aspiring and current data scientists, geared towards students currently working in professional capacities. With 30 required credits and a practicum research project required for graduation, students will gain hands-on experience in information structures, statistical analysis and decision modeling, research methods, data mining, and more.

Stated requirements for admission, according to the program website, include a graduate degree or a bachelor’s degree in relevant fields of study. In some cases, work experience can be considered in lieu of GRE scores.

  • Tracks: Graduate Certificate, Master’s Degree
  • Residency Required: Yes; night courses

Arizona State University: Business Analytics

Individuals with a strong interest in business applications of analytics may be drawn to ASU’s coursework. The 9-month program includes coursework in decision modeling, data modeling, and the application of regression models. Capstone projects for graduates are completed in professional settings. Students are able to complete practicum projects at large corporations located near the ASU campus.

ASU’s website specifies that recent graduates of Bachelor’s-level programs in quantitative disciplines are encouraged to apply, as well as any other candidates with “demonstrated proficiency” in statistics, programming and calculus.

  • Tracks: Master’s of Science in Business Analytics
  • Residency Required: Yes; online program is in-development

Indiana University Bloomington, Data Science

There are two separate options available for Master’s of Data Science students at IU-B. Graduates can complete either a technical or an applied track, known as the “decision maker” path, which includes a special focus on managing analytical projects and strategy. Technical coursework includes a focus on algorithms, security, and cloud topics.

Admission requirements include undergraduate coursework in informatics, computer science, information science, natural sciences, social sciences, or communications. However, IUB’s website also states that candidates who “demonstrate the competency necessary” will also be considered.

  • Tracks: Master’s of Science in Data Science
  • Residency Required: No; online coursework available

Southern Methodist University: Data Science

SMU’s Master’s of Data Science program has a particular focus on project-based learning, in which students are encouraged to draw from a baseline understanding of computer science, data science, and other topics to solve real-world problems and situations.

Admission requirements are largely related to the candidate’s “ability to succeed in the program,” which the admissions committee will determine based on a combination of undergraduate grades, graduate test scores, and professional work experience.

  • Tracks: Master’s of Science in Data Science
  • Residency Required: Not exclusively. 30 credits may be completed online, and 1 credit is earned on-site in Dallas, TX.

University of California, Berkeley: Information and Data Science

UCB’s new Master’s of Information and Data Science is designed to prepare professionals for successful professional entry into analytics careers. The online coursework has a focus on guiding students to successfully solve real-world scenarios, with a focus on providing skills in current tools and methodologies.

UCB has competitive requirements for graduate admission, and the Information and Data Science track is no exception. Applicants are recommended to have an undergraduate GPA of at least 3.0, and either high scores on the quantitative section of the GRE or professional experience. Fundamental groundwork is also required, though applicants may take preparatory courses through UCB. Additional stated requirements relate to the ability to program and communicate effectively.

  • Tracks: Master’s of Science in Information and Data Science
  • Residency Required: Not exclusively. Coursework is delivered online, but students are required to attend a 5-day immersion program on-site in Berkeley, CA

Recommended Resources/Further Reading:

Are you a graduate or affiliate of any listed or unlisted data science or analytics program? Please share your experience in the comments!

 

Originally posted at: http://analyticsweek.com/content/10-us-graduate-education-opportunities-for-aspiring-analytics-professionals/


New MIT algorithm rubs shoulders with human intuition in big data analysis
Category: Big Data Author: AnalyticsWeek Pick Date: 5 years ago Comments: 0

PatentedAlgorithms2
We all know that computers are pretty good at crunching numbers. But when it comes to analyzing reams of data and looking for important patterns, humans still come in handy: We’re pretty good at figuring out what variables in the data can help us answer particular questions. Now researchers at MIT claim to have designed an algorithm that can beat most humans at that task.

[AI can now muddle its way through the math SAT about as well as you can]

Max Kanter, who created the algorithm as part of his master’s thesis at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) along with his advisor Kalyan Veeramachaneni, entered the algorithm into three major big data competitions. In a paper to be presented this week at IEEE International Conference on Data Science and Advanced Analytics, they announced that their “Data Science Machine” has beaten 615 of the 906 human teams it’s come up against.

The algorithm didn’t get the top score in any of its three competitions. But in two of them, it created models that were 94 percent and 96 percent as accurate as those of the winning teams. In the third, it managed to create a model that was 87 percent as accurate. The algorithm used raw datasets to make models predicting things such as when a student would be most at risk of dropping an online course, or what indicated that a customer during a sale would turn into a repeat buyer.

Kanter and Veeramachaneni’s algorithm isn’t meant to throw human data scientists out — at least not anytime soon. But since it seems to do a decent job of approximating human “intuition” with much less time and manpower, they hope it can provide a good benchmark.

[MIT researchers can listen to your conversation by watching your potato chip bag]

“If the Data Science Machine performance is adequate for the purposes of the problem, no further work is necessary,” they wrote in the study.

That might not be sufficient for companies relying on intense data analysis to help them increase profits, but it could help answer data-based questions that are being ignored.

“We view the Data Science Machine as a natural complement to human intelligence,” Kanter said in a statement. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

This post has been updated to clarify that Kalyan Veeramachaneni also contributed to the study. 

View original post HERE.


1 2 3 109