PYTHON에서 DATAFRAME을 만드는 방법

데이터 프레임은 2차원 데이터 모음입니다. 데이터를 표 형식으로 저장하는 데이터 구조입니다. 데이터 세트는 행과 열로 정렬됩니다. 데이터 프레임에 여러 데이터 세트를 저장할 수 있습니다. 데이터 프레임에 열/행 선택 및 열/행을 추가하는 등 다양한 산술 연산을 수행할 수 있습니다.

Python에서 Pandas 라이브러리의 중추적인 구성 요소인 DataFrame은 포괄적인 2차원 데이터 컨테이너 역할을 합니다. 테이블과 유사하게 행과 열 각각에 고유한 인덱스가 부여되어 데이터를 명확하게 캡슐화합니다. 그 다양성으로 인해 열 내에서 다양한 데이터 유형을 수용할 수 있어 복잡한 데이터 세트를 유연하게 처리할 수 있습니다.

Pandas DataFrames는 사용자에게 광범위한 기능을 제공합니다. 사전이나 기타 데이터 구조를 사용하여 구조화된 데이터를 생성하는 것부터 원활한 데이터 액세스를 위한 강력한 인덱싱을 사용하는 것까지 Pandas는 손쉬운 데이터 조작을 촉진합니다. 라이브러리는 조건에 따른 행 필터링, 집계를 위한 데이터 그룹화, 통계 분석 수행 등의 작업을 쉽게 실행하기 위한 직관적인 인터페이스를 제공합니다.

외부 저장소에서 DataFrame을 가져올 수 있습니다. 이러한 저장소는 다음과 같이 불릴 수 있습니다. SQL 데이터베이스, CSV 파일, Excel 파일. 목록, 사전, 사전 목록 등을 사용할 수도 있습니다.

이 튜토리얼에서는 다양한 방법으로 데이터 프레임을 생성하는 방법을 배웁니다. 이러한 다양한 방법을 이해해 봅시다.

먼저 pandas 라이브러리를 설치해야 합니다. 파이썬 환경.

빈 데이터프레임

기본 빈 데이터 프레임을 만들 수 있습니다. DataFrame을 생성하려면 데이터프레임 생성자를 호출해야 합니다. 다음 예를 이해해 봅시다.

예 -

 # Here, we are importing the pandas library as pd import pandas as pd # Here, we are Calling DataFrame constructor df = pd.DataFrame() print(df) # here, we are printing the dataframe

산출:

 Empty DataFrame Columns: [] Index: []

방법 - 2: 목록을 사용하여 데이터프레임 생성

단일 목록 또는 목록 목록을 사용하여 데이터 프레임을 만들 수 있습니다. 다음 예를 이해해 봅시다.

예 -

 # Here, we are importing the pandas library as pd import pandas as pd # Here, we are declaring the string values in the list lst = [&apos;Java&apos;, &apos;Python&apos;, &apos;C&apos;, &apos;C++&apos;, &apos;JavaScript&apos;, &apos;Swift&apos;, &apos;Go&apos;] # Here, we are calling DataFrame constructor on list dframe = pd.DataFrame(lst) print(dframe) # here, we are printing the dataframe

산출:

 0 Java 1 Python 2 C 3 C++ 4 JavaScript 5 Swift 6 Go

설명:

Import Pandas: import pandas as pd는 Pandas 라이브러리를 가져오고 무례함을 위해 pd로 이름을 지정합니다.
목록 만들기: lst는 프로그래밍 방언을 다루는 문자열 값을 포함하는 요약입니다.
DataFrame 개발: pd.DataFrame(lst)은 목록 lst에서 DataFrame을 빌드합니다. 물론 단독 요약이 제공되면 Pandas는 단독 섹션이 있는 DataFrame을 만듭니다.
DataFrame 인쇄: print(dframe)은 후속 DataFrame을 인쇄합니다.

방법 - 3: ndarray/lists의 dict에서 데이터프레임 생성

ndarray/lists의 dict를 사용하여 데이터프레임을 생성할 수 있습니다. 은다레이 길이가 같아야 합니다. 인덱스는 기본적으로 range(n)입니다. 여기서 n은 배열 길이를 나타냅니다. 다음 예를 이해해 봅시다.

예 -

 # Here, we are importing the pandas library as pd import pandas as pd # Here, we are assigning the data of lists. data = {&apos;Name&apos;: [&apos;Tom&apos;, &apos;Joseph&apos;, &apos;Krish&apos;, &apos;John&apos;], &apos;Age&apos;: [20, 21, 19, 18]} # Here, we are creating the DataFrame df = pd.DataFrame(data) # here, we are printing the dataframe # Here, we are printing the output. print(df) # here, we are printing the dataframe

산출:

 Name Age 0 Tom 20 1 Joseph 21 2 Krish 19 3 John 18

설명:

Import Pandas: import pandas as pd는 Pandas 라이브러리를 가져오고 이를 pd로 명명합니다.
사전 생성: 정보는 키가 세그먼트 이름('이름' 및 '나이')인 단어 참조이고 값은 관련 정보를 포함하는 레코드입니다.
DataFrame 개발: pd.DataFrame(data)는 단어 참조에서 DataFrame을 구축합니다. 키는 섹션 이름이 되고, 요약은 세그먼트가 됩니다.
DataFrame 인쇄: print(df)는 후속 DataFrame을 인쇄합니다.

방법 - 4: 배열을 사용하여 인덱스 데이터프레임 만들기

배열을 사용하여 인덱스 데이터 프레임을 생성하는 다음 예제를 이해해 보겠습니다.

예 -

 # Here, we are implementing the DataFrame using arrays. import pandas as pd # Here, we are importing the pandas library as pd # Here, we are assigning the data of lists. data = {&apos;Name&apos;:[&apos;Renault&apos;, &apos;Duster&apos;, &apos;Maruti&apos;, &apos;Honda City&apos;], &apos;Ratings&apos;:[9.0, 8.0, 5.0, 3.0]} # Here, we are creating the pandas DataFrame. df = pd.DataFrame(data, index =[&apos;position1&apos;, &apos;position2&apos;, &apos;position3&apos;, &apos;position4&apos;]) # Here, we are printing the data print(df)

산출:

 Name Ratings position1 Renault 9.0 position2 Duster 8.0 position3 Maruti 5.0 position4 Honda City 3.0

설명:

Import Pandas: import pandas as pd는 Pandas 라이브러리를 가져오고 이를 pd로 명명합니다.
사전 생성: 정보는 키가 세그먼트 이름('이름' 및 '평가')인 단어 참조이고 값은 관련 정보를 포함하는 레코드입니다.
DataFrame 개발: pd.DataFrame(data, index=['position1', 'position2', 'position3', 'position4'])는 단어 참조에서 DataFrame을 구축합니다. 미리 정의된 목록이 라인에 할당됩니다.
DataFrame 인쇄: print(df)는 후속 DataFrame을 인쇄합니다.

방법 - 5: 사전 목록에서 데이터프레임 생성

Pandas 데이터프레임을 생성하기 위해 사전 목록을 입력 데이터로 전달할 수 있습니다. 열 이름은 기본적으로 키로 사용됩니다. 다음 예를 이해해 봅시다.

예 -

 # Here, we are implementing an example to create # Pandas DataFrame by using the lists of dicts. import pandas as pd # Here, we are importing the pandas library as pd # Here, we are assigning the values to lists. data = [{&apos;A&apos;: 10, &apos;B&apos;: 20, &apos;C&apos;:30}, {&apos;x&apos;:100, &apos;y&apos;: 200, &apos;z&apos;: 300}] # Here, we are creating the DataFrame. df = pd.DataFrame(data) # Here, we are printing the data of the dataframe print(df)

산출:

 A B C x y z 0 10.0 20.0 30.0 NaN NaN NaN 1 NaN NaN NaN 100.0 200.0 300.0

행 인덱스와 열 인덱스가 모두 있는 사전 목록에서 pandas 데이터 프레임을 생성하는 또 다른 예를 이해해 보겠습니다.

설명:

Import Pandas: import pandas as pd는 Pandas 라이브러리를 가져오고 이를 pd로 명명합니다.
목록 및 사전 생성: 정보는 모든 구성 요소가 DataFrame의 열을 가리키는 단어 참조인 요약입니다. 단어 참조의 키가 세그먼트 이름이 됩니다.
DataFrame 개발: pd.DataFrame(data)은 단어 참조 요약에서 DataFrame을 구축합니다. 참조라는 단어의 키는 섹션이 되고, 품질은 DataFrame의 정보가 됩니다.
DataFrame 인쇄: print(df)는 후속 DataFrame을 인쇄합니다.

예 - 2:

 # Here, we are importing the pandas library as pd import pandas as pd # Here, we are assigning the values to the lists. data = [{&apos;x&apos;: 1, &apos;y&apos;: 2}, {&apos;A&apos;: 15, &apos;B&apos;: 17, &apos;C&apos;: 19}] # Here, we are declaring the two column indices, values same as the dictionary keys dframe1 = pd.DataFrame(data, index =[&apos;first&apos;, &apos;second&apos;], columns =[&apos;x&apos;, &apos;y&apos;]) # Here, we are declaring the variable dframe1 with the parameters data and the indexes # Here, we are declaring the two column indices with # one index with other name dframe2 = pd.DataFrame(data, index =[&apos;first&apos;, &apos;second&apos;], columns =[&apos;x&apos;, &apos;y1&apos;]) # Here, we are declaring the variable dframe2 with the parameters data and the indexes # Here, we are printing the first data frame i.e., dframe1 print (dframe1, &apos;
&apos;) # Here, we are printing the first data frame i.e., dframe2 print (dframe2)

산출:

 x y first 1.0 2.0 second NaN NaN x y1 first 1.0 NaN second NaN NaN

설명:

pandas 라이브러리는 정보라는 단어 참조 목록에서 시작하여 dframe1 및 dframe2라는 두 개의 확실한 DataFrame을 만드는 데 사용됩니다. 이러한 단어 참조는 DataFrame 내부의 개별 라인을 묘사하는 역할을 하며, 여기서 키는 세그먼트 이름과 관련이 있고 관련 품질은 관련 정보를 다룹니다. 기본 DataFrame인 dframe1은 명시적 줄 파일('첫 번째' 및 '두 번째')과 섹션 레코드('x' 및 'y')로 시작됩니다. 따라서 두 번째 DataFrame인 dframe2는 유사한 정보 수집을 사용하여 생성되지만 명시적으로 'x' 및 'y1'로 표시되는 섹션 파일에는 차이가 있습니다. 코드는 두 DataFrame을 제어 센터에 인쇄하여 닫히고 각 DataFrame의 특정 섹션 디자인을 명확하게 합니다. 이 코드는 Pandas 라이브러리 내에서 DataFrame 생성 및 제어에 대한 광범위한 개요를 채워 섹션 레코드의 다양한 실행 방법에 대한 경험을 제공합니다.

예시 - 3

 # The example is to create # Pandas DataFrame by passing lists of # Dictionaries and row indices. import pandas as pd # Here, we are importing the pandas library as pd # assign values to lists data = [{&apos;x&apos;: 2, &apos;z&apos;:3}, {&apos;x&apos;: 10, &apos;y&apos;: 20, &apos;z&apos;: 30}] # Creates padas DataFrame by passing # Lists of dictionaries and row index. dframe = pd.DataFrame(data, index =[&apos;first&apos;, &apos;second&apos;]) # Print the dataframe print(dframe)

산출:

 x y z first 2 NaN 3 second 10 20.0 30

설명:

이 Python 코드에서 Pandas DataFrame은 단어 참조 배열을 제공하고 열 레코드를 결정하여 pandas 라이브러리를 활용하여 개발되었습니다. 이 주기는 간결성을 위해 'pd'라는 잘못된 이름으로 할당된 pandas 라이브러리를 가져오는 것으로 시작됩니다. 따라서 정보라는 이름의 단어 참조가 특징적으로 설명됩니다. 여기서 모든 단어 참조는 DataFrame의 한 줄을 지정합니다. 이러한 단어 참조 안의 키는 세그먼트 이름을 의미하고 관련 값은 중요한 정보를 나타냅니다.

dframe으로 표시된 DataFrame은 pd.DataFrame() 생성자를 활용하여 생성되며, 제공된 정보를 통합하고 행 레코드를 '첫 번째' 및 '두 번째'로 명시적으로 설정합니다. 후속 DataFrame은 'x', 'y' 및 'z'라는 섹션이 있는 균일한 디자인을 표시합니다. 누락된 품질은 'NaN'으로 표시됩니다.

방법 - 6: zip() 함수를 사용하여 데이터프레임 생성

zip() 함수는 두 목록을 병합하는 데 사용됩니다. 다음 예를 이해해 봅시다.

예 -

 # The example is to create # pandas dataframe from lists using zip. import pandas as pd # Here, we are importing the pandas library as pd # List1 Name = [&apos;tom&apos;, &apos;krish&apos;, &apos;arun&apos;, &apos;juli&apos;] # List2 Marks = [95, 63, 54, 47] # two lists. # and merge them by using zip(). list_tuples = list(zip(Name, Marks)) # Assign data to tuples. print(list_tuples) # Converting lists of tuples into # pandas Dataframe. dframe = pd.DataFrame(list_tuples, columns=[&apos;Name&apos;, &apos;Marks&apos;]) # Print data. print(dframe)

산출:

 [(&apos;john&apos;, 95), (&apos;krish&apos;, 63), (&apos;arun&apos;, 54), (&apos;juli&apos;, 47)] Name Marks 0 john 95 1 krish 63 2 arun 54 3 juli 47

설명:

이 Python 코드는 pandas 라이브러리와 압축 기능을 활용하여 두 레코드, 특히 'Name'과 'Stamps'에서 Pandas DataFrame을 생성하는 방법을 보여줍니다. pandas 라이브러리를 가져온 후 '이름' 및 '검사' 레코드가 특성화되어 DataFrame의 이상적인 섹션을 처리합니다. zip 기능은 이러한 요약의 구성 요소를 비교하여 튜플로 결합하고 list_tuples라는 또 다른 요약을 구성하는 데 사용됩니다.

그런 다음 해당 시점에서 코드는 튜플의 요약을 인쇄하여 조인된 정보를 간략하게 보여줍니다. 결과적으로 dframe이라는 Pandas DataFrame은 pd.DataFrame() 생성자를 활용하여 만들어지며, 여기서 튜플의 요약은 조직화된 짝수 구성으로 변경됩니다. 이 DataFrame 생성 프로세스 중에 '이름' 및 '스탬프' 세그먼트가 명확하게 할당됩니다.

방법 - 7: 일련의 Dicts에서 데이터 프레임 만들기

사전을 전달하여 데이터프레임을 생성할 수 있습니다. 후속 인덱스가 전달된 모든 인덱스 값 시리즈의 통합인 시리즈 사전을 사용할 수 있습니다. 다음 예를 이해해 봅시다.

예 -

 # Pandas Dataframe from Dicts of series. import pandas as pd # Here, we are importing the pandas library as pd # Initialize data to Dicts of series. d = {&apos;Electronics&apos; : pd.Series([97, 56, 87, 45], index =[&apos;John&apos;, &apos;Abhinay&apos;, &apos;Peter&apos;, &apos;Andrew&apos;]), &apos;Civil&apos; : pd.Series([97, 88, 44, 96], index =[&apos;John&apos;, &apos;Abhinay&apos;, &apos;Peter&apos;, &apos;Andrew&apos;])} # creates Dataframe. dframe = pd.DataFrame(d) # print the data. print(dframe)

산출:

 Electronics Civil John 97 97 Abhinay 56 88 Peter 87 44 Andrew 45 96

설명:

이 Python 코드에서 Pandas DataFrame은 pandas 라이브러리를 활용하는 시리즈의 단어 참조로 만들어집니다. 'Gadgets'와 'Common'이라는 두 주제는 섹션으로 처리되며 명시적 파일이 포함된 개별 점수는 dframe이라는 DataFrame으로 조정됩니다. 후속 일반 구성은 제어 센터에 인쇄되어 Pandas를 활용하여 표시된 정보를 조정하고 조사하는 컴팩트한 기술을 보여줍니다.

이 튜토리얼에서는 DataFrame을 생성하는 다양한 방법을 논의했습니다.