Computer Science Principles Unit Post AP Chapter 1 Lesson 5: Cleaning Data

Learning Resource Type

Classroom Resource

Subject Area

Digital Literacy and Computer Science

Grade(s)

9, 10, 11, 12

Overview

In this lesson, students begin working with the data that they have been collecting since the first lesson of the chapter in the class "data tracker". They are introduced to the first step in analyzing data: cleaning the data. Students will follow a guide in Code Studio, which demonstrates the common techniques of filtering and sorting data to familiarize themselves with its contents. Then they will correct errors they find in the data by either hand-correcting invalid values or deleting them. Finally, they will categorize any free-text columns that were collected to prepare them for analysis. This lesson introduces many new skills with spreadsheets and reveals the sometimes subjective nature of data analysis.

Students will be able to:
- filter and sort a dataset using a spreadsheet tool.
- identify and correct invalid values in a dataset with the aid of computational tools
- justify the need to clean data prior to analyzing it with computational tools.

Note: You will need to create a free account on code.org before you can view this resource.

Digital Literacy and Computer Science (2018) Grade(s): 09-12

DLCS18.HS.32

Use data analysis tools and techniques to identify patterns in data representing complex systems.

UP:DLCS18.HS.32

Vocabulary

  • datamining

Knowledge

Students know:
  • how to identify patterns in data.
  • how to select and apply data analysis tools and techniques.
  • use data analysis tools and techniques to identify patterns in data representing complex systems.

Skills

Students are able to:
  • evaluate data sets.
  • select and apply data analysis tools and techniques.
  • use technology to mine data.

Understanding

Students understand that:
  • data can be important in a problem
  • solving process.
  • tools exists to aid in the processing of complex data sets.
  • it can be more efficient to allow a program to identify patterns in a complex data set.
Digital Literacy and Computer Science (2018) Grade(s): 09-12

DLCS18.HS.37

Evaluate the ability of models and simulations to test and support the refinement of hypotheses.

UP:DLCS18.HS.37

Vocabulary

  • model
  • simulations
  • hypotheses
  • phenomena
  • target system

Knowledge

Students know:
  • how to explain the use of models and simulations to generate new knowledge and understanding related to the phenomena or target system that is being studied.
  • how to explain the ability of models and simulations to test and support the refinement of hypotheses related to phenomena under consideration.
a.
  • that modeling and simulations are way to extrapolate and interpolate unrest situation and scenarios to help formulate, test and refine hypotheses.
b.
  • how to form a hypothesis.
  • how to test a hypothesis.
  • how to create a model or simulation.
c.
  • that simulations or models can be created to test a hypothesis but not provide the information expected or intended.
  • that it is vital to verify the data being generated by a model or simulation.

Skills

Students are able to:
  • use a diagram or program to represent a model to express key properties of a phenomena or target system.
  • research existing models and simulations and how they are used to test and refine hypotheses.
  • explain how existing models and simulations are used to test and support the refinement of hypotheses.
a.
  • create a model or simulation to formulate, test, and refine a hypothesis.
  • utilize a model or simulation to formulate, test, and refine a hypothesis.
b.
  • form a model of a hypothesis.
  • test the hypothesis by collecting and analyzing data from a simulation.
c.
  • examine a model or simulation to determine the correctness of the generated data.
  • examine a flawed model or simulation and identify areas in which it is providing incorrect data.

Understanding

Students understand that:
  • a simulation is based on a model and enables observation of the system as key properties change.
  • the accuracy of models and simulations are limited by the level of detail and quality of information used and the software and hardware used.
  • models and simulations are an effective and cost efficient way to understand phenomena and test and refine hypotheses.
a.
  • models and simulations are way to extrapolate and interpolate unrest situation and scenarios to help formulate, test and refine hypotheses.
  • models and simulations can be the only cost- ot time-effective way to test a hypothesis.
b.
  • Models and simulations can save money, are safer, usually requires less time, and do not have the environmental impact that a full experiment or operational test may induce.
c.
  • while a process may operate without errors, that does not guarantee that the process is providing accurate data to meet your needs.

CR Resource Type

Lesson/Unit Plan

Resource Provider

Code.org

License Type

Custom
ALSDE LOGO