Center for Reproductive Health

Workshop Data Management

Share This Story


Most biostatistical courses teach you how to analyze datasets that are ready for analysis, and they do not inform you on how to create data sets. In real data analysis works, creating the analysis datasets often requires more time and skills than conducting statistical analyses. The purpose of this workshop is to teach participants the core data management skills for creating datasets ready for statistical analysis. Participants will be introduced to the use of a data entry program which can be transformed into many data formats, including Stata and SPSS. These skills will help researchers to create better data quality for the paper or report. The workshop uses Stata software, which offers an excellent combination of data manipulation capabilities, and user-friendliness. Furthermore, Stata provides wider statistical analysis techniques and written programs by many experts. Moreover, the Stata program can create reproducible results as scientific investigation requires.

By the end of the workshop participants will be able to:

  • Demonstrate how to enter data from the questionnaire (paper or electronic form) using epidata software,
  • Demonstrate how to use epidata files in other Statistical packages, including Stata and SPSS,
  • Formulate data analysis process: 1) define the study question, 2) collect the data, 3) clean the data, 4) analyze the data, and 5) visualize data and share the findings,
  • Appraise variables in the datasets from other data sources (IDHS, GEAS, and IFLS) using many data formats, including Excel spreadsheets, SPSS, and ASCII files,
  • Formulate main dependent and independent variables as well as covariates in the form of a diagram of analytical frameworks,
  • Investigate data structure, identify errors in data, fix data errors, and confirm that variables have been created correctly,
  • Create analysis datasets that merge data from multiple sources, such as merging from parent and adolescent data sets,
  • Create longitudinal datasets that append data from multiple periods,
  • Create variables that require calculations across observations and files,
  • Reshape the structure of analyses datasets by converting a dataset that has one row per person and one column for each year to a dataset that has one row for each person-year,
  • Increase efficiency and reproducibility of results by conducting all steps of data analysis from within Stata do-files (reading in data; investigating/cleaning data; creating analysis variables; running analyses; and presenting results)
  • Increase productivity by learning how to automate iterative tasks rather than writing separate commands for each task, and
  • Demonstrate how to make reproducible analysis and report acceptable for scientific journals.

About the instructor

Siswanto Agus Wilopo is a Professor of Population Health and a Senior Researcher at the Center for Reproductive Health, Faculty of Medicine, Public Health, and Nursing, The Universitas Gadjah Mada, Yogyakarta Indonesia. He is also an adjunct/visiting Full Professor of the College of Health and Agricultural Sciences, University College Dublin, Ireland. In the Global health field, his current main interest is in the global health system and financing, including financing for reproductive health services and gender-based violence (GBV) problems. His current research addresses issues for adolescent groups, including a multi-country study on global early adolescent health (GEAS) and mental health (NAMHS) with researchers from more than 35 countries.

Teaching Assistant:

Dr. Ifta Choirriyah, MSPH, Ph.D. and Drs Althaf Setiawan, MPH

Researcher at the Center for Reproductive Health, Faculty of Medicine, Public Health, and Nursing, The Universitas Gadjah Mada, Yogyakarta Indonesia