Riipen | Data Sciences Certificate Program Internship

Data Sciences Certificate Program Internship

DATA 1001

Closed

Georgetown University

Washington, District of Columbia, United States

Lisa Andrews

Director of Integrated Career Development

(6)

Timeline

October 16, 2023

Experience start
December 8, 2023

Experience end

Experience

5/5 project matches

Dates set by experience

Preferred companies

Anywhere

Any company type

Any industries

Experience scope

Skills

python data engineering

Learner goals and capabilities

Student consultants are ready to use their technical and analytical skills to support data science and data engineering teams. Students with at least a beginner proficiency in Python will learn applied machine learning using scikit-learn in two phases. The first phase focuses on software engineering for data science and includes skills building with relational and NoSQL databases, data ingestion and wrangling, as well as software engineering topics such as object oriented programming, testing, security, and agile workflows. The second phase focuses on distributional statistical analysis, classification, clustering, and regression modeling using generalized linear models, non-parametric models, probabilistic models, ensemble models, and multi-layer perceptrons. Students will develop model selection techniques using visual diagnostics to interpret model behavior and to find the simplest, most predictive model for a data set using the bias/variance trade-off and analysis and selection of model triples that include algorithm, features, and hyperparameters.

Learners will work on one main project over the course of the semester, connecting with you as needed with virtual communication tools.

Learners

Certificate

Any level

20 learners

Project

30 hours per learner

Educators assign learners to projects

Individual projects

Expected outcomes and deliverables

Deliverables will vary depending on the scope of the project. Some Example Project deliverable include, but are not limited to:

Presentations on data characteristics and model opportunities
Trained scikit-learn model and instance data sets
Data wrangling and analytics Python code
Documentation and visualizations related to model training

Project timeline

October 16, 2023

Experience start
December 8, 2023

Experience end

Project Examples

Requirements

Students are able to operate on projects that follow the data science pipeline: data ingestion, wrangling, computational data storage, modeling and analytics, and visual diagnostics. Students will be able to conduct hypothesis-driven development to build models from data sets that might be used in business decision-making or software applications.

Student(s) work will include, but is not limited to:

Identifying model hypotheses for classification, regression, and clustering in available datasets.
Ingesting and wrangling data into a computational data store so that it can be queried and visualized (both relational and non-relational databases)
Python tools for ingesting, cleaning, and wrangling data sets in a variety of data formats.
The use of Python and SQL to perform distributional analysis and to visualize data.
The use of scikit-learn to perform feature extraction and build transformer pipelines to prepare data for machine learning
Using scikit-learn to training classification, regression, and clustering models on data.
Use of Yellowbrick to perform visual analytics and diagnostics on trained models in order to select the simplest, most predictive model.
Demonstrate the ability to communicate analysis results clearly.
Explain ethical considerations in machine learning and identify bias and fairness in modeling.

Additional company criteria

Companies must answer the following questions to submit a match request to this experience:

Q1 - Checkbox

Be available for a quick phone/virtual call with the instructor to initiate your relationship and confirm your scope is an appropriate fit for the course. *

Q2 - Checkbox

Provide a dedicated contact who will act as the student's primary supervisor over the duration of the virtual placement. A secondary contact should be provided as a backup. Students will interact virtually with their primary contact regularly and as needed. *

Q3 - Checkbox

Provide an opportunity for students to present their work and receive feedback. *

Q4 - Checkbox

Provide relevant information/data as needed for the project. *

Timeline

October 16, 2023

Experience start
December 8, 2023

Experience end

Data Sciences Certificate Program Internship

Timeline

Experience scope

Categories

Skills

Learners

Project timeline

Project Examples

Requirements

Additional company criteria

Timeline