Data Mining & Machine Learning with R Training Course
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Course Outline
Introduction to Data mining and Machine Learning
- Statistical learning vs. Machine learning
- Iteration and evaluation
- Bias-Variance trade-off
Regression
- Linear regression
- Generalizations and Nonlinearity
- Exercises
Classification
- Bayesian refresher
- Naive Bayes
- Dicriminant analysis
- Logistic regression
- K-Nearest neighbors
- Support Vector Machines
- Neural networks
- Decision trees
- Exercises
Cross-validation and Resampling
- Cross-validation approaches
- Bootstrap
- Exercises
Unsupervised Learning
- K-means clustering
- Examples
- Challenges of unsupervised learning and beyond K-means
Advanced topics
- Ensemble models
- Mixed models
- Boosting
- Examples
Multidimensional reduction
- Factor Analysis
- Principal Component Analysis
- Examples
Requirements
This course is part of the Data Scientist skill set (Domain: Analytical Techniques and Methods)
Need help picking the right course?
Data Mining & Machine Learning with R Training Course - Enquiry
Testimonials (1)
The trainer was so knowledgeable and included areas I was interested in.
Mohamed Salama
Course - Data Mining & Machine Learning with R
Related Courses
Cluster Analysis with R and SAS
14 HoursThis instructor-led, live training in Botswana (online or onsite) is aimed at data analysts who wish to program with R in SAS for cluster analysis.
By the end of this training, participants will be able to:
- Use cluster analysis for data mining
- Master R syntax for clustering solutions.
- Implement hierarchical and non-hierarchical clustering.
- Make data-driven decisions to help to improve business operations.
From Data to Decision with Big Data and Predictive Analytics
21 HoursAudience
If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you.
It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.
It is not aimed at people configuring the solution, those people will benefit from the big picture though.
Delivery Mode
During the course delegates will be presented with working examples of mostly open source technologies.
Short lectures will be followed by presentation and simple exercises by the participants
Content and Software used
All software used is updated each time the course is run, so we check the newest versions possible.
It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.
Data Mining and Analysis
28 HoursObjective:
Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
Data Mining
21 HoursCourse can be provided with any tools, including free open-source data mining software and applications
Data Mining with Python
14 HoursThis instructor-led, live training (online or onsite) is aimed at data analysts and data scientists who wish to implement more advanced data analytics techniques for data mining using Python.
By the end of this training, participants will be able to:
- Understand important areas of data mining, including association rule mining, text sentiment analysis, automatic text summarization, and data anomaly detection.
- Compare and implement various strategies for solving real-world data mining problems.
- Understand and interpret the results.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Data Mining with R
14 HoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Data Vault: Building a Scalable Data Warehouse
28 HoursIn this instructor-led, live training in Botswana, participants will learn how to build a Data Vault.
By the end of this training, participants will be able to:
- Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
- Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
- Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
- Build and deploy highly scalable and repeatable warehouses.
Data Visualization
28 HoursThis course is intended for engineers and decision makers working in data mining and knoweldge discovery.
You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information.
Data Mining with Excel
14 HoursThis instructor-led, live training in Botswana (online or onsite) is aimed at data scientists who wish to use Excel for data mining.
- By the end of this training, participants will be able to:
- Explore data with Excel to perform data mining and analysis.
- Use Microsoft algorithms for data mining.
- Understand concepts in Excel data mining.
Data Mining with Weka
14 HoursThis instructor-led, live training in Botswana (online or onsite) is aimed at beginner to intermediate-level data analysts and data scientists who wish to use Weka to perform data mining tasks.
By the end of this training, participants will be able to:
- Install and configure Weka.
- Understand the Weka environment and workbench.
- Perform data mining tasks using Weka.
Data Science for Big Data Analytics
35 HoursBig data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.
Foundation R
7 HoursThis instructor-led, live training in Botswana (online or onsite) is aimed at beginner-level professionals who wish to gain a mastery of the fundamentals of R and how to work with data.
By the end of this training, participants will be able to:
- Understand the R programming environment and RStudio interface.
- Import, manipulate, and explore datasets using R commands and packages.
- Perform basic statistical analysis and data summarization.
- Generate visualizations using both base R and ggplot2.
- Manage workspaces, scripts, and packages effectively.
KNIME Analytics Platform for BI
21 HoursKNIME Analytics Platform is a leading open source option for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist and business analyst.
This course for KNIME Analytics Platform is an ideal opportunity for beginners, advanced users and KNIME experts to be introduced to KNIME, to learn how to use it more effectively, and how to create clear, comprehensive reports based on KNIME workflows
Platforma analityczna KNIME - szkolenie kompleksowe
35 HoursThe "Analytics Platform KNIME" training offers a comprehensive overview of this free data analytics platform. The program includes an introduction to data processing and analysis, installation and configuration KNIME, building workflow, methodology for creating business models and data modeling. The course also covers advanced data analysis tools, workflow import and export, tool integration, ETL processes, data mining, visualization, extensions and integrations with tools such as R, Java, Python, Gephi, Neo4j. The conclusion includes an overview of reporting, integration with BIRT and KNIME WebPortal.
Introduction to Data Visualization with Tidyverse and R
7 HoursThe Tidyverse is a collection of versatile R packages for cleaning, processing, modeling, and visualizing data. Some of the packages included are: ggplot2, dplyr, tidyr, readr, purrr, and tibble.
In this instructor-led, live training, participants will learn how to manipulate and visualize data using the tools included in the Tidyverse.
By the end of this training, participants will be able to:
- Perform data analysis and create appealing visualizations
- Draw useful conclusions from various datasets of sample data
- Filter, sort and summarize data to answer exploratory questions
- Turn processed data into informative line plots, bar plots, histograms
- Import and filter data from diverse data sources, including Excel, CSV, and SPSS files
Audience
- Beginners to the R language
- Beginners to data analysis and data visualization
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice