In light of changing regulations, views, and data storage solutions in recent years, Open AI, the creators of ChatGPT, provided an update in late 2024 to its privacy policies, to better inform users as to the use of data controllers, what data is collected, how it’s used, retained, and more.
Primary Sources
In its privacy policy, Open AI states that three primary sources of information are used to power the company’s foundation models (the language models trained via data sets):
- Data that is publicly available on the internet
- Third-party partner data
- Information provided by human trainers and researchers
Data Collected
Open AI divides the kinds of information is collected into several categories:
- Account Information: the personal data users provide upon creating an account (e.g. account credentials, birthdays, contact information, etc.)
- User Content: user-made prompts and uploaded files (images, audio recordings, etc.)
- Communication Information: the contents of communications sent via email or social media comments
- Other Information You Provide: data provided during events or in surveys (e.g. age or identity data)
- Personal Data Received from Use of Services: technical information gleaned from a user’s activity, including:
- Web browsers being used
- Time zone
- Content viewed
- Operating system
- Location or IP address
- Account cookies
- Data Received from Other Sources: third-party data, including security and service threats, or customer marketing guidance.
Data That is Not Collected
To stay on the right side of the law, there are certain sources and types of data that Open AI avoids – including information stored behind paywalls and dark web data.
Open AI policies also state that the company applies specialised filters to avoid content such as hate speed, spam, and adult content. And as its software is intended for a 13+ age audience, any data submitted by persons under the age of 13 will be investigated and deleted.
Personal Data
How It is Used
The data Open AI receives is used in a variety of ways and has several application, including:
- Improving existing services (e.g. training large language models (“LLMs”)) or assisting in developing new products
- Analysing and responding to ChatGPT prompts
- Preventing misuse of services or fraudulent actions
- Complying with various legal obligations such as privacy rights or third-party requirements
- Communicating information with users about events or service updates
How It Is Not Used
Despite collecting enormous amounts of data on a regular basis, Open AI has stated that there are certain applications for data that it will not utilise, including building users’ profiles to contact, advertise, or sell customers anything – including raw information.
Data Disclosure
A user’s personal data might, in due course, be required to be exposed to several different parties, which include:
- Contracted or partner vendors
- Counterparties assisting in a service transaction (e.g. a bankruptcy or receivership matter)
- Government authorities
- Open AI account administrators or affiliates
- Third-party vendors and users
Data Retention
While Open AI does not provide exact guidance on the length of time for which data will be stored (and even states on one help page that “ChatGPT does not copy or store training information in a database”), it does lay out the factors that determine how long data will be kept, which includes:
- Legal requirements
- Potential acts of harm that could result from unauthorised use or disclosure
- The quantity, sensitivity, and nature of the data
- Service processing purposes
Deidentified Data
Some data may also be anonymised (“deidentified”) so as to be de-linked from its original record, thus no longer identified with the source user. Open AI may deidentify information but retain the data for the purposes of analysing and improving service offerings or conducting research.
Legal Requests and Denials
Given the numerous legal jurisdictions in which it operates, Open AI strives to ensure it is compliant with all local laws and protects personal information properly. Should it encounter a data request that it considers unlawful, Open AI may deny the issued request.
OAIC Privacy Considerations
In the context of Australia and the Privacy Act 1988, the Office of the Australian Information Commissioner (OAIC) published a report listing five privacy consideration takeaways for Australian organisations that are considering using commercial AI products.
- Conduct due diligence to ensure it’s being used for its intended purpose
- Establish clear policies and procedures around usage, transparency, and proper privacy governance
- AI systems generating or inferring personal information must comply with Australian Privacy Principle 3 (APP 3) – Collection of Solicited Personal Information – demonstrating a clear business need
- Any personal information input into an AI system must be used or disclosed “for the primary purpose for which it was collected” in accordance with APP 6 (Use or Disclosure of Personal Information)
- As a matter of best practice, they recommend not entering personal or sensitive information into publicly available generative AI tools
Key Takeaways
As more and more publicly-available data (personal, sensitive, or otherwise) gets collected and repurposed to train large language model text generation systems (generative AI), it is more important than ever to engage in careful and secure data practices when dealing with publicly accessible text generation tools.