Dataset information
Available languages
French
Keywords
up13, enseignement-superieur, universite, seine-saint-denis, universite-paris-13, scolarite
Dataset description
This is a dataset updated annually the description below relates to the first year of online release, since updates have taken place in 2018 (data 2008-2017) and 2019 (data 2009-2018).
Paris 13 University recorded data on student registration in its information system (Apogee software) for each academic year between 2006(-2007) and 2015(-2016). These data relate to the diplomas prepared, the steps to achieve this, the scheme (if it concerns initial training or apprenticeship), the relevant components (UFR, IUT, etc.), and the origin of students (type of baccalaureate, academy of origin, nationality). Each entry concerns the main enrollment of a student at the university for a year. The attributes of this data are as follows.
— CODE_INDIVIDU Hidden Data
— ANNEE_INSCRIPTION Year of registration:2006 for 2006-2007, etc.
— LIB_DIPLOME Diploma Name
— LEVEAU_DANS_LE_DIPLOME 1, 2,... for master 1, license 2, etc.
— LEVEAU_APRES_BAC 1, 2,... for Bac+ 1, Bac+ 2,...
— LIBELLE_DISCIPLINE_DIPLOME Attachment of the diploma to a discipline
— CODE_SISE_DIPLOME Student Tracking Information System Code
— CODE_ETAPE Internal code of a stage (year, course) of diploma
— LIBELLE_COURT_ETAPE Short name of step
— LIBELLE_LONG_ETAPE More intelligible name of the step
— LIBELLE_COURT_COMPOSANT Name of component (UFR, IUT etc.)
— CODE_COMPOSANT Number code of component (unused)
— REGROUPEMENT_BAC Type of Bac (L, ES, S, techno STMG, techno ST2S,...)
— LIBELLE_ACADEMIE_BAC Academy of Bac (Creteil, Versailles, foreigner,...)
— Continent Deduced of nationality which is masked data
— LIBELLE_REGIME Initial training, continuing, pro, learning
Paris 13 University publishes part of this dataset through several resources, while respecting the anonymity of its students.
Starting from 213,289 entries that correspond to all enrolments of the 106,088 individuals who studied at Paris 13 University during the ten academic years between 2006(2007) and 2015(-2016), we selected several resources each corresponding to a part of the data. To produce each resource we chose a small number of attributes, then removed a small proportion of the inputs, in order to satisfy a k-anonymisation constraint with k = 5, i.e. to ensure that, in each resource, each entry appears at least 5 times identical (otherwise the input is deleted). The four resources produced are materialised by the following files.
— The file ‘up13_etapes.csv’ concerns the diploma steps, it contains the attributes “CODE_ETAPE”, “LIBELLE_COURT_ETAPE”, “LIBELLE_LONG_ETAPE”, “NIVEAU_APRES_BAC”, “LIBELLE_COURT_COMPOSANTE”, “LIBELLE_DISCIPLINE_DIPLOME”, “CODE_SISE_DIPLOME”, “NIVEAU_DANS_LE_DIPLOME” and its anonymisation causes a loss of 918 entries.
— The file ‘up13_Academie.csv’ concerns the Bac Academy and it contains the attributes “LIBELLE_ACADEMIE_BAC”, “NIVEAU_APRES_BAC”, “NIVEAU_DANS_DIPLOME”, “CONTINENT”, “LIBELLE_REGIME”, “LIB_DIPLOME”, “LIBELLE_COURT_COMPOSANTE” and its anoymisation causes the loss of 7525 entries.
— The file ‘up13_Bac.csv’ concerns the type of Bac and the level reached after the Bac, it contains the columns “REGROUPEMENT_BAC”, “NIVEAU_APRES_BAC”, “LIBELLE_REGIME”, “CONTINENT”, “LIBELLE_COURT_COMPOSANTE”, “LIB_DIPLOME”, “NIVEAU_DANS_LE_DIPLOME” and its anonymisation causes the loss of 3,933 entries.
— The file ‘up13_annees_etapes.csv’ concerns enrolment in the diploma stages year after year, it contains the columns “ANNEE_INSCRIPTION”, “LIBELLE_COURT_COMPOSANTE”, “NIVEAU_APRES_BAC”, “LIB_DIPLOME”, “CODE_ETAPE” and its anonymisation causes the loss of 3,532 entries.
Other tables extracted from the same initial data and constructed using the same method of anonymisation can be provided on request (specify the desired columns).
A second set of resources offers the follow-up of students year after year, from degree stage to degree stage. In this dataset, we call **trace** such tracking when the registration year has been forgotten and only the sequence remains. And we call **cursus** a data describing this succession of steps over the years. For anonymisation we have grouped the traces or the same paths and as soon as there were less than 10 we do not indicate their number, or, what amounts to the same, we put this number to 1 (the information being that there is at least one student who left this trace or followed this course). This leads to forgetting a number of too specific study paths and keeping only one as a witness.
Starting from 106,088 trails or tracks, we produce the following resources.
— The file ‘up13_traces.csv’ contains the sequence of diploma step codes (a trace) and anonymisation makes us forget 10 089 traces.
— The file ‘up13_traces_wt_etape.csv’ contains similar traces, but without the step code. That is to say, only the diploma, the level after baccalaureate and the component concerned remain. Anonymisation makes us forget 4,447 traces.
— The file ‘up13_traces_bac_wt_etape.csv’ contains the same data as in the file ‘up13_traces_wt_etape.csv’ but also with the Bac type. Anonymisation makes us forget 8,067 traces.
— The file ‘up13_cursus_wt_etape.csv’ contains the same data as in the file ‘up13_traces_wt_etape.csv’ with the additional registration years. Anonymisation makes us forget 8,324 courses.
Build on reliable and scalable technology