Data Sciences Certificate Program Internship

Timeline
-
October 16, 2023Experience start
-
December 8, 2023Experience end
Experience scope
Categories
Machine learning Data visualization Data analysis Data modelling Data scienceSkills
python data engineeringStudent consultants are ready to use their technical and analytical skills to support data science and data engineering teams. Students with at least a beginner proficiency in Python will learn applied machine learning using scikit-learn in two phases. The first phase focuses on software engineering for data science and includes skills building with relational and NoSQL databases, data ingestion and wrangling, as well as software engineering topics such as object oriented programming, testing, security, and agile workflows. The second phase focuses on distributional statistical analysis, classification, clustering, and regression modeling using generalized linear models, non-parametric models, probabilistic models, ensemble models, and multi-layer perceptrons. Students will develop model selection techniques using visual diagnostics to interpret model behavior and to find the simplest, most predictive model for a data set using the bias/variance trade-off and analysis and selection of model triples that include algorithm, features, and hyperparameters.
Learners will work on one main project over the course of the semester, connecting with you as needed with virtual communication tools.
Learners
Deliverables will vary depending on the scope of the project. Some Example Project deliverable include, but are not limited to:
- Presentations on data characteristics and model opportunities
- Trained scikit-learn model and instance data sets
- Data wrangling and analytics Python code
- Documentation and visualizations related to model training
Project timeline
-
October 16, 2023Experience start
-
December 8, 2023Experience end
Project Examples
Requirements
Students are able to operate on projects that follow the data science pipeline: data ingestion, wrangling, computational data storage, modeling and analytics, and visual diagnostics. Students will be able to conduct hypothesis-driven development to build models from data sets that might be used in business decision-making or software applications.
Student(s) work will include, but is not limited to:
- Identifying model hypotheses for classification, regression, and clustering in available datasets.
- Ingesting and wrangling data into a computational data store so that it can be queried and visualized (both relational and non-relational databases)
- Python tools for ingesting, cleaning, and wrangling data sets in a variety of data formats.
- The use of Python and SQL to perform distributional analysis and to visualize data.
- The use of scikit-learn to perform feature extraction and build transformer pipelines to prepare data for machine learning
- Using scikit-learn to training classification, regression, and clustering models on data.
- Use of Yellowbrick to perform visual analytics and diagnostics on trained models in order to select the simplest, most predictive model.
- Demonstrate the ability to communicate analysis results clearly.
- Explain ethical considerations in machine learning and identify bias and fairness in modeling.
Additional company criteria
Companies must answer the following questions to submit a match request to this experience:
Timeline
-
October 16, 2023Experience start
-
December 8, 2023Experience end